LRI, Faculty of Sciences, Mohammed V University in Rabat, Avenue des Nations Unies, Agdal, 10000 Rabat, Morocco
Copyright © 2016 Othmane Blial et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Software-defined networking offers several benefits for networking by separating the control plane from the data plane. However, networks’ scalability, reliability, and availability remain as a big issue. Accordingly, multicontroller architectures are important for SDN-enabled networks. This paper gives a comprehensive overview of SDN multicontroller architectures. It presents SDN and its main instantiation OpenFlow. Then, it explains in detail the differences between multiple types of multicontroller architectures, like the distribution method and the communication system. Furthermore, it provides already implemented and under research examples of multicontroller architectures by describing their design, their communication process, and their performance results.
Unlike traditional networks, software-defined networking (SDN)  separates the control from the data plane in network devices, like switches and routers. This new concept suggests the use of a centralized controller that determines the behavior of all forwarding components in the network.
Southbound interfaces permit communication between the control plane and the data plane, while northbound interfaces provide enormous possibilities for networking programmability, like creating applications that can automate all networking tasks. Consequently, SDN will enhance creativity, as well as innovation, in the domain of networking.
Three critical requirements are not achievable in an SDN-enabled centralized network, which was the main tendency for early proposed SDN architectures, using just one controller: first, efficiency that is not enough established with just one centralized controller, second, scalability that is one of the most issues that pushes network architects to consider the idea of multicontrollers, and, third, high availability, which has two items, redundancy and security. Redundancy is one of the most significant aspects of any design. One controller could fail anytime and, for this reason, abandon the network without its control plane. Security is considered an important item. If an attacker compromises the controller, subsequently it loses the entire management over the network. Clearly, if we have multiple controllers, we can certainly minimize the issue, because they will team up to identify that another one is misbehaving and for that reason separate the attacker from the network.
All these arguments push network designers to think seriously about integrating multicontroller architectures in their designs, and several works have been proposed in this context. Therefore, we were motivated to write a comprehensive overview that explains in detail different aspects and characteristics related to distributed architectures in a software-defined network and clarifies some notions that can be confused, as the difference between logically and physically distributed architectures.
Many papers have done surveys and overviews about SDN; for example, the authors of  explained in detail almost all notions and concepts related to SDN. They mention multicontrollers when they talk about the scalability issue. They also provide a table that distinguishes the difference between centralized and distributed controllers; however, they do not give more information about the distribution method. Another survey  discusses SDN by explaining its features and clarifying in detail its layers. It mentions multicontrollers briefly when it talks about methods to enhance the control layer performance. There is another interesting survey  about SDN that presents chronologically its development. Their authors mention multicontrollers, by providing in a paragraph the difference between centralized and distributed controllers. To the best of our knowledge, we have found only one paper  that presents a specific survey about the control plane. Nevertheless, it does not give enough information about the implementations of multicontrollers it provides.
The remainder of this paper is organized as follows: we provide a review of the architecture of SDN and its main implementation OpenFlow in Section 2. Further, we present a thorough explanation about the characteristics and related subjects of the distribution of multicontroller designs in Section 3. In Sections 4 and 5, we present some of the proposed multicontrollers by explaining their architectures, classifying them by distribution method, while mentioning the performance results of each model. Finally, in Section 6, we give a conclusion.
2. SDN Architecture
In this section, we will present a review of the architecture of SDN and OpenFlow, its main implementation, as shown in Figures 1 and 2, respectively.
Figure 1: SDN architecture.
Figure 2: OpenFlow architecture.
2.1. SDN Architecture
An SDN architecture contains six major components.
First is the management plane, which is a set of network applications that manage the control logic of a software-defined network. Rather than using a command line interface, SDN-enabled networks use programmability to give flexibility and easiness to the task of implementing new applications and services, such as routing, load balancing, policy enforcement, or a custom application from a service provider. It also allows orchestration and automation of the network via existing APIs .
Second is the control plane that is the most intelligent and important layer of an SDN architecture. It contains one or various controllers that forward the different types of rules and policies to the infrastructure layer through the southbound interface .
Third, the data plane, also known as the infrastructure layer, represents the forwarding devices on the network (routers, switches, load balancers, etc.). It uses the southbound APIs to interact with the control plane by receiving the forwarding rules and policies to apply them to the corresponding devices .
Fourth, the northbound interfaces that permit communication between the control layer and the management layer are mainly a set of open source application programming interfaces (APIs) .
Fifth, the east-west interfaces, which are not yet standardized, allow communication between the multiple controllers. They use a system of notification and messaging or a distributed routing protocol like BGP and OSPF.
Sixth, the southbound interfaces allow interaction between the control plane and the data plane, which can be defined summarily as protocols that permit the controller to push policies to the forwarding plane. The OpenFlow protocol is the most widely accepted and implemented southbound API for SDN-enabled networks.
OpenFlow is normalized by the Open Networking Foundation (ONF) , backed by the leaders of IT industry like Facebook, Cisco, Google, HP, and others. For this reason, understanding the OpenFlow architecture is important to grasp the notion of SDN, which we are going to present in the next subsection. Before that, we should realize that OpenFlow is just an instantiation of SDN, as there are many existing and under development southbound APIs, for instance, OpFlex , which distributes some of the complexity of managing the network to the infrastructure layer to improve the scalability. On the other hand, ForCES  proposes a flexible method to ameliorate the management of traditional networks without using a logically centralized controller, while ROFL , which relies on OpenFlow, provides an API for software developers to enable full development of new applications .
2.2. OpenFlow Architecture
In an OpenFlow-enabled network, flow can be represented as a transmission control protocol (TCP) connection. Flows can also be packets with a matching MAC address, an IP address, a VLAN tag, or a switch port .
The OpenFlow switch has one or more flow tables. A flow table is a set of flow entries. A flow entry is used to match and process packets. It consists of many matching fields to match packets, a set of encounters to track packets, and instructions to apply . The OpenFlow switch uses an OpenFlow channel to communicate with the OpenFlow controller .
The OpenFlow channel is a secure channel between the OpenFlow switch and the OpenFlow controller. It permits communication by allowing the control plane to send instructions, receive requests, or exchange information. All messages are encrypted, using transport layer security (TLS) . The OpenFlow channel has three types of messages. The controller/switch message is initiated by the controller and may not require a response from the switch. The asynchronous message informs the controller about a packet arrival, a switch state change, or an error. The symmetric message can be sent in both directions for other purposes .
The OpenFlow controller handles flow tables inside the switch by adding and removing flow entries. It uses the OpenFlow channel to send and receive information . It can be considered as an operating system that serves the whole network.
The OpenFlow protocol is the southbound interface that permits communication between the OpenFlow controller and the OpenFlow switch via the OpenFlow channel .
3. Architectures of a Multicontroller Software-Defined Network
A multicontroller architecture is a set of controllers working together to achieve some level of performance and scalability. In a software-defined network, multicontroller architectures can have different aspects and characteristics that we are going to discover in the next paragraphs, the differences between logically or physically centralized and distributed architectures, and flat and hierarchical designs. Then, we will try to describe some aspects, like elasticity, controller placement, and communication intercontrollers.
3.1. Physically Centralized versus Physically Distributed
In a software-defined network, there are two types of architectures, physically centralized and physically distributed. When SDN appeared, the main tendency was to have a physically centralized controller; however, due to some issues, like the single point of failure and the scalability problem, network experts suggest physically distributed designs.
It is important to understand that we are talking about a multicontroller architecture, just in the case of the physically distributed network. Thus, in the next sections and paragraphs, we will always assume that the controllers are always physically distributed.
Controllers like Beacon  and NOX  have used multithreading techniques to split a single controller logically to increase its performance. In this case, it is so obvious, since we have a single controller, that we are not talking about a multicontroller architecture.
A physically distributed network can differ on various levels, like how to place controllers and also which type of communication to use among them.
In the next paragraphs, we will explain the different related subjects to a physically distributed architecture.
3.2. Logically Distributed versus Logically Centralized
A physically distributed architecture can be either logically centralized or logically distributed.
Logically centralized means that we take advantage of the concept of a multicontroller design, but at the same time, we always consider that we have a single controller. In other words, we take the charge, and we distribute it among the multiple controllers; however, for the underlying layer, it is like there is just one controller that commands the whole network. Another idea was proposed  before implementing multiple controllers, which is installing replicated controllers to remove the single point of failure. Nevertheless, this method has many disadvantages such as using passive controllers that will be active, just in case the main controller fails. In a logically centralized architecture, all the controllers have the same responsibilities, and they split the charge equally. They are always aware of every change in the network, and they share the same information instantly, thanks to the network synchronization.
In a logically distributed architecture, the controllers are physically and logically distributed. Additionally, every controller has just a view of the domain it is responsible for, and it can take decisions for it, unlike a logically centralized design, where each controller makes a decision based on the global network view.
In a word, a logically centralized architecture stays near to the initial tendency of SDN, which is using a single controller, or a multicore controller to improve the performance.
On the other hand, a logically distributed architecture goes away from the first tendency of SDN, by making several controllers have several responsibilities inside the network.
3.3. Flat Architecture versus Hierarchical Architecture
In the majority of papers relevant to our context, we have found that multicontroller architecture can follow a flat or a hierarchical design.
In a flat or horizontal architecture, the controllers are positioned horizontally on one single level. In other words, the control plane consists of just one layer, and each controller has the same responsibilities at the same time and has a partial view of its network.
In a hierarchical or vertical architecture, the controllers are positioned vertically. They are portioned among multiple levels, which means that the control plane has several layers, generally two or three. The controllers have different responsibilities, and they can take decisions based on a partial view of the network.
These two methods have many advantages and disadvantages; for example, both of these approaches can improve the switch/controller latency in comparison to a single controller architecture or a multicore architecture. In flat design, the network provides more resilience to failures. However, the task of managing controllers becomes harder. On the other hand, a hierarchical design gives a simpler way to manage the controllers, but the problem of a single point of failure remains, because of the upper layer of the control plane. To explain more this final idea, in a hierarchical architecture, we usually have about three layers. Each layer contains a type of controllers. Typically, the bottom layer contains the local controllers, while the upper layer contains one root controller, which means that we have the problem of a single point of failure, even if it concerns just one layer of the control plane.
3.4. Dynamic Architecture versus Static Architecture
A logically centralized architecture can be dynamic or static.
In a dynamic or an elastic architecture, the links and the positions between the controllers, as well as the switches, are changeable, which makes the network flexible.
In a static architecture, the links and the positions between the controllers and also the switches are unchangeable, which gives more stability and less overhead to the network in comparison to a dynamic architecture.
3.5. Intercontrollers Communication
Communication intercontrollers are the method used to allow exchanging information among the multiple controllers of a software-defined network.
For instance, the “publish and subscribe” messaging paradigm works as follows: in this system, a set of switches subscribe to a particular controller and each controller does the same. After that, controllers publish information between each other to build a global network view.
Another example is a system of notifications exchange. Each controller will send information to its neighbors about its local state to create a global network view.
Previous examples can be more suitable for centralized architectures, while distributed architectures are more likely to implement well-known distributed routing protocols, like BGP, OSPF, and IS-IS.
Building a global network view is always connected to the notion of consistency. This last mentioned one can be either weak, which implies that updates between controllers take a period to be fully applied, or strong that signifies that the multiple controllers read the updates at the same time, which affects the performance of the network positively .
3.6. Placement Problem in Multicontroller Software-Defined Architecture
The number of used controllers and their positions in distributed network architectures will certainly impact the overall performance of the control plane, which is a significant challenge for network designers. A research paper  discusses this problem deeply, called the placement problem of controllers. It tries to solve the placement problem of controllers in WAN networks by improving the delay between a controller and a switch, as well as between two controllers, in order to minimize the response time and enhance the ability of the network to interact more quickly.
The authors also demonstrated that a way to determine how many controllers to use and where to place them is to control three factors: first, the desired reaction limits, notably the latency requirements, second, the metric choices, such as availability, fairness of state of the network, processing, and bandwidth, and, third, the network topology.
They also found that surprisingly a single controller is enough to meet response time requirements in a medium-size network.
4. Logically Centralized Architectures
In this section, we will present examples of logically centralized multicontroller architectures.
ONIX  is a distributed control plane that contains a cluster of one or more physical servers; each one may run multiple ONIX instances.
To understand how ONIX works, we should grasp its role in the network and the utility of its API.
A network controlled by ONIX has four components: first, the physical infrastructure, which includes all the network switches and routers, and other network devices, such as load balancers and firewalls. ONIX interacts with the physical infrastructure by reading and writing the state controlling of each element, for example, the forwarding table entries, second, the connectivity infrastructure, which is the communication between the physical network and ONIX, third, the control logic that relies on the top of ONIX’s API. It controls the desired network behavior, and, fourth, ONIX, which is responsible for giving the control logic programmatic access to the network.
The ONIX API is a useful API developed for network control. It allows control applications to read and write the state of any element in the network. It is a data model that represents the entire network infrastructure, with each network element corresponding to one or more data objects. The control logic, already defined, can read the current state of each object. Each copy of the network state of an object that is related to a network element is stored in the Network Information Base, NIB. This NIB is a graph that contains all the network entities. Also, network applications are implemented by reading and writing to the NIB, while ONIX distributes the NIB data between multiple running instances.
NIB has a collection of network entities. Each one has key-value pairs. Based on these pairs, each entity has a set of attributes. The NIB provides multiple methods for the control logic to access the network entities. It has the complete control over the state of an entity, because it maintains an index for each one of them.
The NIB uses a system of notifications. For example, when it receives the notification “Query,” it means that the NIB needs to find one or multiple network entities, while when it receives “Create,” it means that it must create a new entity.
ONIX provides three methods to improve the scalability of its network. First is by partitioning the network logically, in other words, by distributing the workload on multiple ONIX instances. Second, ONIX can allow multiple nodes to show up as a single node in the upper layer, which is called aggregation. Third, ONIX allows data state applications that can be used to improve the consistency and the durability of the network.
ONIX provides a scenario where we can experience the scalability of the network: a network with a modest number of switches that can be easily managed by a single ONIX instance. The authors found that the control logic can record all forwarding information from the switches. Also, it can coordinate all the data and share them on the multiple instances.
The main results of the evaluation study of ONIX have found that, thanks to the partitioning process, ONIX can partition the workload over multiple ONIX instances. So, in case there is an overhead inside an ONIX instance, already assigned switches can be reassigned to another ONIX instance.
HyperFlow  is an application developed on the top of the NOX controller, to enable logically centralized multicontroller architectures.
The HyperFlow-based network contains three parts: a control layer, a forwarding layer, and an application layer. The control layer contains multiple NOX controllers that are working cooperatively. In the forwarding layer, the switches are connected to the nearest controller. However, a switch can be reassigned to another controller in case of failure.
To propagate information in the control plane, HyperFlow uses a “publish/subscribe” messaging paradigm. This system aims to provide a guaranteed event delivery. It is also responsible for keeping the ordering of events published by the same controller. Also, it minimizes the traffic required for intercontrollers to have less overhead.
This “publish/subscribe” system runs on the top of WheelFS , a distributed file system that delivers flexible wide area storage for distributed application. It permits the applications to have more control over the control plane.
In a HyperFlow-based network, we find three channels to permit interaction between the different components: the data channel, the control channel, and the controller channel.
The controllers publish and subscribe to all of them. OpenFlow commands are published only on the controller channel, which is also used to prevent failures inside the network. The data and the control channel are mainly used to execute the publish/subscribe system to permit communication intercontrollers.
A large number of network events request only some types of services, like routing. The global network view is not affected by the changing order of arriving events or those that target the same switch. In some cases, when the network is not able to identify the events that might change the network’s state, HyperFlow can implement state synchronization among applications running on the top of the controllers to resolve the problem.
The authors have found that HyperFlow-based controllers can operate more smoothly under heavy load synchronization and keep minimal latency in comparison to NOX controllers.
They also found that HyperFlow can keep an acceptable amount of consistency among controllers for some 1000 arriving events per second, for instance, 1000 events that include switch and host connecting and disconnecting the network and a link state change. Nevertheless, we noticed a disadvantage, which is the added delay when a controller converges or reaches network synchronization, which increases the response time.
ONOS  provides two prototypes of a software-defined multicontroller model, which differ in many aspects.
The first prototype has three characteristics: the global network view, the scalability, and the fault tolerance. This prototype keeps a global network view by gathering switch, port, and link information.
The network view has three components: Titan  (a graph database), Cassandra  (a key-value store), and Blueprints  (a graph API to expose network state to the application layer).
ONOS can add supplementary instances to distribute the workload on the control plane when it is scaling out.
ONOS can reassign a task to another instance to prevent failures.
The results of the evaluation study of ONOS prototype 1 showed that ONOS can control hundreds of switches and hosts. Moreover, ONOS can add dynamically and efficiently switches and instances and deal instantly with network failures.
ONOS presents a decent level of consistency and integrity, because it uses Titan that maintains the graphs’ structural integrity and Casandra, which has a high level of consistency.
The first problem of prototype 1 is excessive data store operations. In other words, the task of mapping data, from the Titan graph to Cassandra, results in a significant number of data store operations, which slows the network. The second problem is the lack of notification and messaging system, which is essential for the proper communication between the controllers.
The second prototype focuses on improving the performance of the first prototype, while keeping the global network view consistent. Since the main problem of the first prototype was an excessive data store operation, in prototype 2, the authors will try to solve this issue, following two complementary approaches. The first one concerns making remote operations as fast as possible, while the second approach focuses on reducing the number of remote operations.
Following the first approach, they implemented the Titan/Cassandra system with a Blueprints graph implementation on the top of a data store called: RAMCloud , which has a low latency of order of 15–30 μs.
Following the second approach, they created a cache topology system. This way some of the remote data store operations are stored in the memory cache. Likewise, they can reduce the number of data storage operations globally in the system.
To remove the problem of notifications intercontrollers, the authors adopted a communication system based on Hazelcast . These communications will go through some channels installed at the top of all instances of the control plane.
DISCO  has two parts: an intradomain and an interdomain.
The intradomain part is responsible for monitoring the network and managing the flow prioritization. It contains a set of modules that dynamically handle the multiple network issues, like broken links, high latency, and bandwidth. The most important module is the Extended Database, which is the central component where all controllers store their information. Next is the Monitor Manager Module, which is responsible for gathering information, such as flow throughput, and calculating the one-way latency and the packet loss rate. Doing these operations periodically, the controller sends a current view of the link and the network devices’ performances to the Extended Database Module. Additionally, the Events Processing Module keeps track of the variation or the saturation events while the Path Computation Module computes routes for flows, from a source to a destination inside the control plane. Finally, the Service Manager Module handles the network SLAs. An SLA is a service-level agreement, which is a contract that documents what customers have requested from a service provider.
The interdomain part provides the communication among the multiple controllers and has two modules.
First is the Messenger Module that builds channels between neighboring controllers to share information with the link state and the host presence. Communications use in general a well-known protocol as OSPF or BGP. The authors have chosen a protocol called AMQP , which provides routing, messaging with orientation, and prioritized querying. The Messenger only helps local agents to exchange information, but it does not support communication for wide area networks.
The Agents Module contains four main agents, starting with the Connectivity Agent that shares information about connecting controllers and their neighboring information. Next is the Monitoring Agent, which periodically sends information about latency and bandwidth that are available for the network to all the connected devices. Then it is the Reachability Agent that advertises the presence of a device in the network to become reachable by all the other devices. And finally, the Reservation Agent reserves update requests of the network, including capability requirements.
The evaluation study of DISCO followed three use cases. The first one was a scenario that puts in the challenge of self-adaptation of the control plane in a case of failure. The results have shown that, after the Monitoring Agent discovers the failure, the Connectivity and the Reachability Modules take in charge the task of failure recovery, working together with the Messenger.
The second scenario showed how DISCO is helpful in resource reservation, thanks to the Service Manager and Reservation Agents.
The third scenario shows how DISCO can manage to migrate a virtual machine, from one domain to another inside a DISCO architecture, with low latency and high reachability.
ELASTICON  has an elastic architecture. It has a cluster of autonomous controllers that share the workload to provide a global network view for the management layer. This global network view is built by the Distributed Data Store Module. Likewise, every controller has a TCP channel, connecting it to a neighboring controller to ensure exchanging messages intercontrollers and switch migration.
At the physical layer, which contains switches, each one is connected to multiple controllers where one of them is the master, and the others are slaves. Each controller has the Core Controller Module that takes in charge all the responsibilities of a centralized controller. It also gives the controller the ability to negotiate with other controllers to choose the master of the topology. The primary feature of ELASTICON is elasticity, which is represented in this case as the switch migration procedure.
In the switch migration process, controllers can be added or removed according to some predefined thresholds, which represent the traffic load of the network. The load balancing is performed periodically and independently of the traffic load.
For its evaluation study, ELASTICON used an enhanced Mininet Testbed , which emulates a network of Open v-Switches . The experimental results show that adding controller nodes increases the throughput almost linearly, and also the throughput reduces when they restrict the controllers to two cores. Additionally, it proves that the response time increases marginally, up to the point when there is a higher packet generation rate when ELASTICON has a higher number of nodes. Also, the study shows that the load balancing via switch migration can improve the performance.
The evaluation process also indicates that the switch migration process takes about 20 ms, which proves the speed of the process.
5. Logically Distributed Architectures
KANDOO  is a logically distributed controller with a hierarchical design of two layers.
The lower layer contains local controllers, where each one controls its subdomain, while the upper layer contains the root controller, which leads all the lower layer.
The local controllers only reply to events that are previously subscribed by the root controllers. Therefore, if the developers want to deploy new applications, they need to configure the root controller manually to permit it to subscribe to the new application. So, this controller is not a zero configuration framework.
KANDOO can coexist with other controllers on the same network and be customized to the specified needs of the network.
KANDOO’s authors did a performance study using different applications in an emulated environment by presenting the results obtained about the elephant flow detection problem.
This elephant flow is a large continuous flow over a network link that decreases the total bandwidth after a certain amount of time.
KANDOO has two types of deployed applications, App detect, which is running on the top of all the local controllers, while App reroute is installed only on the root controller. App detect fires one flow per second and reports a flow as an elephant, if it has sent more than 1 MB of data. The final results show that KANDOO scales significantly better than a traditional OpenFlow network, while solving the elephant flow problem increasingly.
However, the study has shown no information about workload and performance of local controllers, comparing to a standard OpenFlow topology.
ORION  is a hybrid hierarchical control plane, which is a mix of flat and hierarchical architectures. It tries to combine the benefits of both designs and put them all together in a hybrid structure to avoid the issues that each architecture separately faces, such as the superlinear computational complexity growth, caused by flat architectures when the network scales to a large one, and the path stretch problem of hierarchical designs.
A network controller by ORION has three layers: the physical layer, which contains all the physical devices, such as OpenFlow switches; the bottom layer of the control plane that includes the area controllers, which handle collecting physical device and link information, as well as dealing with intra-area requests and updates. This layer also has a significant task in the network, which is abstracting the network view and sending it to the management layer of the control plane, and, finally, the upper layer, which contains the domain controllers. A distributed protocol synchronizes the information of interdomain controllers to keep a global network view for the application layer. Abstracting views from the bottom to the upper layer can reduce the problem of computational complexity in large scale networks.
ORION relies on multiple modules to operate. First, the OpenFlow Base Module, which handles OpenFlow-related tasks, such as collecting information from switches and forwarding them to the control plane; second, the Host Management Module, which gathers the host information in the bottom layer using ARP packets and deals with problems, resolving unknown MAC addresses; third, the Link Discovery Module that manages collecting information on the multiple areas; next, the Topology Management Module, which abstracts the infrastructure layer’s topology and sends it to the bottom layer and then abstracts the bottom layer’s topology and sends it to the upper layer; and finally, the Storage Module, which stores information regarding hosts, switches, links, and other parts of the network.
ORION interdomain controllers’ communication relies on the Horizontal Communication Module that synchronizes the information among domain controllers to build a global network view, while interarea and domain controllers’ communication relies on the Vertical Communication Module, which is a set of TCP connections that permit area controllers to send the abstracted topology of the infrastructure layer and request information from the domain controller when a host in some domain wants to reach a particular host in another domain. This discussion leads us to talk about the Routing Management Module, which controls all the routing tasks, using the Dijkstra algorithm.
ORION has made a theoretical and an experimental evaluation to test the performance of its control plane.
On one hand, the theoretical evaluation shows that the computing time of ORION has a linear growth, which is much lower than the traditional Dijkstra routing algorithm.
On the other hand, the experimental evaluation tried to verify the feasibility and the effectiveness of ORION. The study, made using Mininet, demonstrated that when the number of areas increases, the delay time also increases gradually. Additionally, ORION has low overhead.
Before the conclusion of this paper, we would like to discuss this section and the previous section, which provide various examples of multicontroller distributed architectures. On one hand, in Section 4, we have presented some proposals of physically distributed but logically centralized designs, like ONIX and HyperFlow, which are more suitable for datacenter and enterprise networks, and in many cases, they do not need a distributed protocol to ensure communication intercontrollers, and they have strong consistency. On the other hand, in this section, we have provided some examples of physically and logically distributed designs, which are more suitable for WAN networks. They are more likely to use a distributed protocol like BGP, and they usually have weak consistency. Finally, we think that there has been significant work concerning logically centralized architectures contrary to logically distributed architectures, which present many future research issues, like finding new methods to improve the global consistency, or developing standardized east-west interfaces to connect between different types of controllers.
Software-defined networking is based on the idea of splitting the control plane and the forwarding plane and, following that, centralizing the whole control plane in one single controller that manages the entire network.
However, throughout the years, the academia and the industry in the networking field started to realize that the future of SDN relies on distributed architectures, because centralized architectures do not fulfill the needs of efficiency, scalability, and availability. In this paper, we tried to provide a comprehensive overview of SDN multicontroller architectures by explaining their characteristics and presenting in detail different examples of implemented and under research architectures and solutions.
Network researchers and designers will have to deal with many problems that distributed architectures face to enhance a multicontroller network, like developing an efficient communication process, creating an adequate network design, or integrating new applications into the northbound interface that support multiple controllers.
The authors declare that they have no competing interests.
Software Defined Networking (SDN) adds flexibility to control-plane implementation choices, allowing one to choose between points on the implementation design axis: centralized to distributed, micro-flow to aggregated-flow, reactive to proactive, virtual to physical, and fully-consistent to eventually-consistent.
In our previous articles on open source SDN controllers, we have seen that all SDN controllers typically include an open/standards-based interface to the hardware, a network operating system, and well-defined APIs to write various network applications. We have also seen that controllers differ from each other in these three aspects.
In this article, we will provide a not-so-comprehensive comparative analysis of different open source SDN controllers. At the onset, I would to highlight that the problem of comparing SDN controllers is an extremely challenging one, mainly because it is a multi-faceted problem. In addition, comparative analysis typically helps in ‘selecting’ a controller to meet a particular (or a set of) requirement(s). However, in this article we will not be talking about controller selection, and will restrict ourselves to highlighting differences from various aspects.
We will focus on the following:
- Categorization of important interdependent aspects for controller comparison.
- Identification of the aspects the current comparative studies have both covered and missed.
- Comparison of SDN controllers from the perspective of applicability — use cases and application networks.
The Controller Aspects for Comparative Analysis
Figure 1 (below) shows difference aspects of SDN controllers — features, efficacy, use-cases and application networks — that could be considered for comparative analysis. The figure also highlights the fact that the ‘controller features’ form the most important aspect, influencing the various efficiency targets of the controllers, such as scalability, extensibility, programmability and interoperability.
The importance of the controller features also arises from the fact that they decide the applicability and the use cases that a controller can support. For example, if a controller is to support OpenStack-based network virtualization, it needs to have features and services such as Neutron plugins, and port and flow configurability.
In addition, applications also specify the services they want from the network. For example, an application that is delay sensitive expects the controller to include a service that can provide an end-to-end path with the least amount of delay.
Let’s look at different aspects in detail before we provide a comparison of controllers based on applicability.
The SDN controller features can be broadly classified into three different categories as shown in Figure 2 below:
In our previous articles on SDN controllers, we have mostly focused on these controller features from the different perspective of user/administrator, system-developer and application-developer.
Controller efficacy is an umbrella term used to refer to the different performance parameters – performance, scalability, reliability and security. Almost all the existing comparative studies focus on one or more of these parameters. Performance could define various metrics, such as number of interfaces a controller can handle, latency, throughput, etc. Similarly, there are various metrics used in the existing works defining the scalability, reliability and security.
Use Cases and Application Networks
“It is important to realize that few, if any, IT organizations want a SDN,” Jim Metzler argued.
What IT organizations want is to solve current problems and add value. If SDN can enable them to do that better than alternative approaches to networking, then it will be widely adopted.
SDN’s key aspect — providing the freedom of refactoring the network control plane to the network designers — has found its application in various domains, such as network virtualization and network functions virtualization (NFV). In fact, the recent interest in NFV has also triggered renewed interested in SDN.
In the available literature, we will find various use cases that have been proposed for SDN, which we’ll discuss later in detail. We’ll also see how some researchers and developers talk about the applicability of the controllers to a specific type of network — data center networks, service provider networks and campus networks. Hence, is very useful to compare controllers from the perspective of of use cases and application networks.
What do the Existing Studies Say?
Below, we describe three important comparative studies. Though some comparative analyses provided by the commercial controllers do exist, they are not mentioned here to avoid any ‘marketing’ for a vendor:
1. Tootoonchian, Gorbunov, Ganjali, Casado and Sherwood were among the first to provide the comparative analysis of controllers: “On Controller Performance in Software-Defined Networks.” They covered a limited number of controllers (NOX-MT, Beacon and Maestro) and focused on controller performance only. These controllers have already been superseded by other controllers like POX, Ryu, FloodLight and OpenDaylight.
2. “Feature-based Comparison and Selections of Software Defined Networking (SDN) Controllers” was conducted by Khondoker, Zaalouk, Marx and Bayarou. In this work the authors provide a set of requirements: TLS Support, virtualization, open source, interfaces, GUI, RESTful API, productivity, documentation, modularity, platform support, TLS support, age, OpenFlow support, OpenStack Neutron support – of which the first three they define as mandatory and the remaining 10 (which are in turn prioritized) as optional. The existing open source controllers (and their properties) are compared based on these requirements.
Additionally, for selecting the best controller based on optional requirements (for example, GUI will be extremely preferred over the age of the controller), a Multi-Criteria Decision Making (MCDM) method named Analytic Hierarchy Process (AHP) has been adapted by a monotonic interpolation/extrapolation mechanism which maps the values of the properties to a value in a pre-defined scale. By using the adapted AHP, the top-most five controllers have been compared, and “Ryu” is selected to be the best controller based on their requirements.
3. “Advanced study of SDN/OpenFlow Controllers” was performed by Shalimov, Zuikov, Zimarina, Pashkov and Smeliansky. The authors claim to do an impartial comparative analysis of the effectiveness of the widely used SDN controllers: NOX, POX, Beacon, Floodlight, MuL, Maestro and Ryu. In this work, the authors define an umbrella term — controller effectiveness — as a collection of efficiency indexes, such as performance, scalability, reliability and security. The conclusions provided by the authors include:
- There were a number of possible security vulnerabilities of the tested controllers.
- Most of the controllers were capable of coping with an average workload.
- Maximum throughput was demonstrated by Beacon.
Critical Analysis of Existing Studies
- The majority of the work is restricted to the controller features and controller effectiveness, such as performance, security and reliability.
- The applicability of the controller, in terms of supported use cases and application networks (which are dependent on the features of the controller), is not well studied.
- Within controller efficacy, various parameters are not addressed. For example, the controller capacity (in terms of the number of switches/interfaces it can manage) is not well studied. Even the parameters, such as mobility and scalability, are overlooked.
- Some of the controller features, such as support for proactive/reactive flow management, hierarchy/multiplicity of controllers, Southbound protocols supported and interoperability with legacy (non-SDN) networks are totally overlooked.
- Multi-tenancy, an important aspect for multiple application networks (Data Center, Transport, etc.) is not well evaluated.
- The comparison based on reactive and proactive paradigms is also missing. It would be very interesting to see how and what proactive operations are supported by a controller.
- No proper justification is provided for the choice of “mandatory requirements for SDN controllers.”
- Many commercial controllers mostly restrict their comparative analysis to the controller features and rarely talk about the “numbers” — number of interfaces, number of tenants and virtual networks, etc.
- Comparative analysis of both the Northbound APIs and the Southbound protocols is not comprehensive enough to provide significant insight.
- The aspect of popularity among the application developers – meaning the number of applications they developed with the controller — is not well documented.
- Finally, user-friendliness, in terms of visualization options such as GUI and CLIs, are not well compared.
Comparison of Open Source SDN Controllers: Use Cases and Application Networks
Table 1 below describes the important use-cases.
Here’s some use cases that review different SDN environments: Netsocket, Lippis Report, OpenFlow 1,OpenFlow 2, NFV and Network Virtualization, Network Programmability, Brocade SDN (docx) and Bare Metal SDN.
|Legacy Network Interoperability||Supports legacy networking elements and frameworks to realize “end-to-end” network automation and services.|
|Distributed Edge Overlays||Use of tunneling (generally L2 in L3) to create an overlay network, to decouple a network service from the underlying infrastructure, and to allow core networks to scale and evolve independently of the offered services.|
|Hop-by-hop network virtualization||Ex: VLANs. A “Reactive” SDN controller creates a flow path from one edge of the network to the other edge of the network, typically using OpenFlow, by installing flow on each switch in the path, including the aggregation or core switches.|
|OpenStack Neutron Support||Neutron is an OpenStack project which provides ”networking as a service.” It allows different back-ends (called plugins). OpenFlow controllers, by implementing Neutron’s plugins, realize all the network abstractions expected by OpenStack.|
|L4-L7 Service Insertion and Service Function Chaining||Service insertion is a common data center practice where network services — firewalls, load balancers, IPS, caching, and other deep packet inspection (DPI) — are added dynamically in the infrastructure, on a per-tenant basis, for any single/aggregate flow.|
|Network Monitoring||OpenFlow switches provide detailed accounting data (similar to SNMP interface counters) with every flow. Controller can collect that accounting data from devices and provide the same to the operators in any required granularity (IP statistics, per-MAC address, per-application statistics, etc.).|
|Policy Enforcement||Enforcing a consistent policy across single or multiple network domains — also across single or multiple locations — is one of the most interesting use cases.|
|Load Balancing||Automatically characterizes traffic types and forwards them to a variety of network resources – (CERN Ex: identify Internet-destined traffic and automatically distribute that load across various WAN interfaces or devices). Similar to other use cases, it also involves monitoring the network traffic and programing flows based on demand and server capacities.|
|Traffic Engineering||Method of optimizing network performance by dynamically analyzing, predicting and regulating the behavior of data transmitted over the network.|
|Dynamic Network Taps||Typically, a network tap is a purpose-built device that provides a way to access the data flowing across an IP network. They provide visibility and troubleshooting capabilities on any port in a multi-switch deployment. A SDN controller can realize this functionality with the recent OpenFlow specifications.|
|Multi-Layer Network Optimization||Extending SDN to support interconnectivity of IT resources, such as virtual computing/virtual machines (VMs) and storage, using emerging optical transport and switching technologies (e.g., elastic optical networks), as well as existing packet networks.|
|Transport Networks: NV, Traffic-Rerouting, Interconnecting DCs, etc.||Forwarding large amounts of trusted data and bypassing expensive security devices. To create dynamic interconnects at Internet interchanges between enterprise links or between service providers using cost-effective, high-performance switches. Dynamic Bandwidth: enables programmatic controls on carrier links to request extra bandwidth when needed|
|Use cases for Campus Network||Campus is corporate or educational as characterized by Mobile clients, BYOD, video, and the ever-growing number of connected devices and applications. Some of the use cases within the campus network include traffic isolation, seamless mobility, security and policy enforcements, bandwidth management and application awareness.|
|Routing Support||A combination of Link discovery and management, topology managements, and a routing engine – routing-protocol specific.|
Before we discuss the comparative analysis of the controllers, it is important to note the following points regarding the use cases, which highlight the challenges, from the point of view of comparative analysis:
- The use cases covered here are the important and popular ones, and should be considered a complete list. Hence, you may see some insignificant use cases here that some controllers support and some don’t.
- The majority of the use cases (legacy network support, campus-networks, multi-layer network optimization, etc.) do not have a clear specification and scope-definition. It would be extremely difficult to say “YES” or “NO” if a controller supports it or not.
- Alternatively, there are use cases (edge-overlays, OpenStack Neutron support, hop-by-hop network virtualization, etc.) which have clear specifications, and comparative analysis of those is relatively straightforward as a “YES” or “NO.”
- There are use cases (load-balancing, monitoring, policy enforcement, etc.) that have a lot of overlapping in their functionalities. Some controllers may support such use cases partially, and comparison in those cases also becomes challenging.
Table 2 (below) summarizes the comparison of open source controllers considering different use cases.
- Supports completely.
- The majority of the elements exist and can support with minor enhancements.
- There exists a commercial/proprietary solution(s) developed on top of the open source controller, which helps in supporting the use case.
- Supports a significant number of use cases within an application network domain.
- Does not support.
- Only foundational elements exists and they need major enhancements.
- No commercial/proprietary solution exists on top of the open source controller.
- Supports the use case partially.
- Significant number of services/applications exist in the controller to realize the support for the use case.
- Supports partial set of use cases within an application network domain.
- The work is in progress.
From Table 2 we can see that ODL’s features make it more suitable to support the majority of use cases. This could be one of the reasons for its immense popularity. Ryu and Floodlight have a similar number of use case supports, with Ryu having support for two of the most important use cases: service insertion and chaining, and transport networks. Similarly, NOX/POX and Trema have similar uses case supports.
Comparing SDN controllers is both interesting and challenging; however, using such comparative analysis to select a controller is even more difficult. One needs to consider various aspects of the controller, and this article is one such attempt to share different aspects for comparative studies. There is still a strong need for a comprehensive comparative analysis, as mentioned in the preceding critical analysis section. Finally, applicability is an interesting aspect which should be considered when comparing SDN controllers.
Sridhar received his Ph.D. in computer science from the National University of Singapore in 2007; his M.Tech. degree in computer science from KREC, Surathkal, India, in 2000; and his B.E. degree in instrumentation and electronics from SIT, Tumkur, Bangalore University, India, in August 1997. He worked as a Research lead at SRM Research Institute, India; post-doctoral fellow at Microsoft Innovation Center, Politecnico Di Torino, Turin, Italy; and as a research fellow at the Institute for Infocomm Research (I2R) in Singapore. He has worked on various development and deployment projects involving ZigBee, WiFi and WiMax. Sridhar is currently working as the Group Technical Specialist with NEC Technologies India Limited. Sridhar’s research interests lie mainly in the domain of next-generation wired and wireless networking, such as OpenFlow, software defined networking, software defined radio-based systems for cognitive networks, Hotspot 2.0 and the Internet of Things.
Featured image via Flickr Creative Commons.