Category Archives: OpenFlow

Networking Guy’s, Just don’t understand software

Before I begin my rant, let me just say my first router was a WellFleet CN with a VME Bus and my first Cisco router was an AGS+.. I have been around long enough to see DecNet, IPX, IP, SNA, Vines and a few others running across my enterprise network while troubleshooting 8228 MAU’s beaconing in the wiring closets and watching NETBEUI “Name in Conflict” packets take down a 10’000 node RSRB network.

Its 2012 and gone are the days when network engineers need to juggle multiple protocol behaviors such as IPX GetNearestServer, IP PMTU bugs in Windows NT 3.5.. or trying to find enough room in your UMB to fit all your network ODI drivers without crashing Windows.

Its a new age, and as we look back at almost 40 years since the inception of the Internet and almost 20 years since TCP/IP was created, we are at the inflection point of unprecedented change fueled by the need to share data, anytime, anywhere on any device.

The motivation for my writing this entry comes from some very interesting points of view from my distinguished ex-colleague Brad Hedlund entitled “Dodging Open Protocols with open software“. In his post he tries to dissect both the intentions and impact of a new breed of networking players such as Nicira on the world of standardized protocols.

The point here isn’t to blow a standards dodger whistle, but rather to observe that, perhaps, a significant shift is underway when it comes to the relevance and role of “protocols” in building next generation virtual data center networks.  Yes, we will always need protocols to define the underlying link level and data path properties of the physical network — and those haven’t changed much and are pretty well understood today.

The “shift in relevance and role of protocols”  is attributed not necessarily at what we know as the IETF/IEEE based networking stack and all the wonderful protocols which make up our communications framework, but in a new breed of protocols necessary to support SDN.

Sidebar: Lets just go back a second and clarify the definition of SDN. Some define Software Defined Networking in terms of control plane, data plane separation, which clearly has been influenced by the work on OpenFlow.

So the shift that we see in networking which is towards more programmability and the fact that we need new ways to invoke actions and carry state is at the crux of this shift..

However, with the possibility of open source software facilitating the data path not only in hypervisor virtual switches, but many other network devices, what then will be the role of the “protocol”? And what role will a standards body have in such case when the pace of software development far exceeds that of protocol standardization.”

Ok so this is the heart of it.. “what then will be the role of the “protocol”? And what role will a standards body have in such case when the pace of software development far exceeds that of protocol standardization.”

I think the problem here is not necessarily the semantics of the word “protocol” (for this is just a contract which two parties agree upon), but the fact that there is a loosely defined role in how this “contract” will be standardized to promote an open networking ecosystem.

Generally standardization only comes when there is sufficiently understood and tested software which provide the specific implementation of that standard. Its very hard to get a protocol specification completely right without testing it in some way..

Sidebar: If you actually go back in history you will find that TCP/IP was not a standard.. The INWG was the governing standards body of the day in defining the international standard which was supposed to be INWG 96 but because the team at Berkley got TCP up into BSD Unix, well now its history..I wrote a bit about it here: http://garyberger.net/?p=295.

With that in mind, take a closer look at the Open vSwitch documentation, dig deep, and what you’ll find is that there are other means of controlling the configuration of the Open vSwitch, other than the OpenFlow protocol.

When it comes to OVS its very important not to confuse interface and implementation. Since OVS in a classical form just a switch, you operate it through helper routines to manipulate the state management layer in the internal datastore called OVSDB and interact with the OS. This is no different than say a CLI on a Cisco router. Most of the manipulation in the management plane will probably be exposed through JSON-RPC (Guessing here) through a high-level REST interface.

What you must understand about OVS when related to control plane/data plane separation or  “flow-based network control” is you are essentially changing the behavior from a standardized switch based on local state to a distributed forwarding engine coordinated with global state.

From OVS:

The Open vSwitch kernel module allows flexible userspace control over flow-level packet processing on selected network devices. It can be used to implement a plain Ethernet switch, network device bonding, VLAN processing, network access control, flow-based network control, and so on.

Clearly since we are in the realm of control plane/data-plane separation we need to have a protocol (i.e. contract) which is agreed upon when communicating intent. This is where OpenFlow comes in..

Now unfortunately OpenFlow is still a very nascent technology and is continuing to evolve but Nicira wants to solve a problem. They want to abstract the physical network address structure in the same way that we abstract the memory address space with VMM’s (see Networking doesn’t need VMWARE but it does need better abstractions). In order to do this they needed to jump ahead of the standards bodies (in this case the ONF) and adopt some workable solutions.

For instance, OVS is not 100% compliant with OpenFlow 1.0 but has contributed to better models which will appear soon in the 1.2 specification. OVS uses an augmented PACKET_IN format and matching rules


/* NXT_PACKET_IN (analogous to OFPT_PACKET_IN).
*
* The NXT_PACKET_IN format is intended to model the OpenFlow-1.2 PACKET_IN
* with some minor tweaks. Most notably NXT_PACKET_IN includes the cookie of
* the rule which triggered the NXT_PACKET_IN message, and the match fields are
* in NXM format.

Summary:

Open Source networking is nothing new, you have XORP, Zebra, Quagga, OpenSourceRouting.org, Vyatta and standard bridging services built into Linux.

Just like with TCP/IP if there is value in OpenFlow or whatever its derivatives are we will see some form of standardization. OVS is licensed under Apache 2, so if you want to fork it go ahead thats the beauty of software. In the mean time I wouldn’t worry so much about these control protocols, they will change over time no doubt and good software developers will encapsulate the implementations and publish easy to use interfaces.

What I think people should be asking is not so much about the protocols (they all suck in their own way because distributed computing is really, really hard) but what can we do once we have exposed the dataplane in all its bits to solve some very nasty and complex challenges with the Internet?.

Networking doesn’t need VMWARE but it does need better abstractions

Lately there has been a lot of talk around the network and the corresponding conflation of terms and hyperbole around “Network Virtualization including Nypervisor, Software Defined Networking, Network Abstraction Layer, SDN, OpenFlow, etc.

Recently a blog entry entitled “Networking Needs a VMWare (Part 1: Address Virtualization)” appeared on Martin Casado’s blog which tries to make a case for comparing the memory virtualization capability in today’s modern hypervisors to network virtualization.

This sort of left an uneasy feeling in fully describing why we are seeing this activity in the network domain specifically to deal with the broken address architecture. This post is to try and bring some clarity to this and to maybe dig deeper into the root causes or problems in networking which have led us to this point.

The synopsis in the blog goes like:

One of the key strengths of a hypervisor lies in its insertion of a completely new address space below the guest OS’s view of what it believes to be the physical address space. And while there are several possible ways to interpose on network address space to achieve some form of virtualization, encapsulation provides the closest analog to the hierarchical memory virtualization used in compute. It does so by taking advantage of the hierarchy inherent in the physical topology, and allowing both the virtual and physical address spaces to support complete forwarding and addressing models. However, like memory virtualization’s page table, encapsulation requires maintenance of the address mappings (rules to tunnel mappings). The interface for doing so should be open, and a good candidate for that interface is OpenFlow.

The author of the blog post is trying to describe a well-known aphorism by David Wheeler, which states: “All problems in computer science can be solved by another level of indirection”. This statement is at the heart of “virtualization” as well as other references in communications layering, computer architecture and programming models.

Sidebar OSI Model

Lots of networking professionals like to refer to the 7-layer OSI model when talking about network abstractions. The problem is the OSI model was never adopted; in addition most OSI engineers agree that the top 3-layers of the OSI (Application, Presentation and Session) belongs in “one” application layer. We utilize a derivative of that model which is essentially the four-layers representative in the TCP/IP model.

Lets first try and define what an address is and then what is meant by encapsulation being careful not to conflate these two important yet independent terms.

Addressing and Naming

The first thing to recognize is that the Internet is comprised of two name spaces, what we call the Domain Name System and the Internet Address Space. These turn out to be just synonyms for each other in the context of addressing with different scope. Generally we can describe an address space as consisting of a name space with a set of identifiers within a given scope.

An address-space in a modern computer system is location-dependent but hardware-independent thanks to the virtual memory manager and “memory virtualization”. The objective of course is to present a logical address space which is larger than the physical memory space in order to give the illusion to each process that it owns the entire physical address space. This is a very important indirection mechanism, if we didn’t have this, applications would have to share a much smaller set of available memory. Does anyone remember DOS?

“Another problem with TCP/IP is that the real name of an application is not the text form that humans type; it’s an IP address and its well-known port number. As if an application name were a macro for a jump point through a well-known low memory address. – Professor John Day”

Binding a service, which needs to be re-locatable to a location-dependent address, is why we have such problems with mobility today (in fact we may even conclude that we are missing a layer).  Given the size and failure rates of today’s modern data-centers this problem also impacts the reliability of the services and applications consumers are so dependent on in todays web-scale companies.

So while this is a very important part of OS design, its completely different from how the Internet works because the address system we use today has no such indirection without breaking the architecture (i.e. NATS, Load Balancers, etc).

If this is true, is the IP address system currently used on the Internet “location-dependent”?  Well actually IP addresses were distributed as a “location-independent” name, not an address.  There are current attempts to correct this such as LISP, HIP as well as “BeyondIP” solutions such as RINA.

 So it turns out the root of the problem in relation to addressing is that we don’t have the right level of indirection because according to Saltzer and Day, we need a “location-independent” name to identify the application or service but all we have is a location-dependent address which is just a symbolic name!.

What is encapsulation?

Object Oriented Programming refers to encapsulation as a pattern by which [“the object’s data is contained and hidden in the object and access to it restricted to members of that class”]. In networking we use encapsulation to define the different layers of the protocol stack, which, as we know “hides” the data from members not in the Layer, in this way the protocol model forms the “hour-glass” shape minimizing the interface and encapsulating the implementation.

Sidebar Leaky Abstractions

Of course this isn’t completely true as the current protocol model of TCP/IP is subject to a “leaky-abstraction”. For instance there is no reason for the TCP logic to dive into the IP frame to read the TOS data structure, doing so would be a “Layer Violation” but we know that TCP reaches into IP to compute the pseudo header checksum. This rule can be dismissed if we think of TCP/IP as actually one layer as it was before 1978. But the reality of the broken address architecture leads to the “middle boxes” which must violate the layers in order to rewrite the appropriate structures to stitch back together the connection.

So how does encapsulation help?

In networking we use encapsulations all the time..

 We essentially encapsulate the data structures which need to be isolated (the invariants) with some other tag, header, etc. in order to hide the implementation. So in 802.1Q we use the C-TAG to denote a broadcast domain or VLAN, in VXLAN we encapsulate the host within a completely new IP shell in order to “bridge” it across without leaking the protocol primitives necessary for the host stack to process within a hypervisors stack.

From the blog.. “encapsulation provides the closest analog to the hierarchical memory virtualization in compute”

So in the context of a “hierarchy” yes we encapsulate to hide but not for the same reasons we have memory hierarchies (i.e. SRAM(cache) and DRAM). This generalization is where the blog post goes south.

So really what is the root of the problem and how is SDN an approach to solve it?

From an earlier statement we need a “location-independent” name to identify the application or service but all we have is a location-dependent address which is just a symbolic name!. If we go back to Saltzer we see that’s only part of the problem as we need a few more address/names and the binding services to accomplish that.

 One interesting example to this is the implementation of Serval from Mike Freedman at Princeton University. Serval actually breaks the binding between the application/service name and the inter-networking address..(Although there are deeper problems then this since we seem to be missing a network layer somewhere). Serval accomplishes this through the manipulation of forwarding tables via OpenFlow although it can be adapted to use any programmable interface if one exists. Another example is the NDN Project led by Van Jacobson

In summary

Yes it is unfair to conflate “Network Virtualization” with “OS Virtualization” as they deal with a different level of abstraction, state and purpose. Just as hypervisors were invented to “simulate” a hardware platform there is the need to “simulate” or abstract the network in order to build higher-level services and simplify the interface (not necessarily the implementation). In fact a case can be made that “OS Virtualization” may eventually diminish in importance as we find better mechanisms for dealing with isolation and protection of the host stack while network virtualization will extend beyond the existing solutions and even existing protocols allowing us to take on a new set of challenges. This is what makes SDN so important; not the implementation but the interface. Once we have this interface, which is protocol independent, we can start to look at fixing the really hard problems in networking in a large scale way..

NodeFlow: An OpenFlow Controller Node Style

In less you’ve been under a rock lately, you might have heard something about Software Defined Networks, OpenFlow, Network Virtualization and Control Plane/Data Plane separation.

Some of the reasons for the interest might be:

  • Evolution of the system architecture as a whole (Network, NIC, PCIE, QPI, CPU, Memory) along with X86_64 instructions, OS, drivers, software and applications have allowed for many services to run on a single host including network services. Extending the network domain into the host allows for customizable tagging, classification, load balancing and routing, with the utopia being ubiquitous control of logical and physical by a combination if in-protocol state, forwarding tables and a distributed control system.
  • Non-experimental network pathologies, which are causing havoc with large-scale systems. Turns out there are some very “real” problems, which were never part of Ethernet and TCP/IP design space and software allows us to experiment with different ideas on how to solve these problems.
  • Leveraging a possibly untapped design space in order to be differential,  leap frog competition or disrupt the marketplace

So what is OpenFlow? Well according to the Open Networking Foundation:

OpenFlow is an open standard that enables researchers to run experimental protocols in the campus networks we use every day”

This paradigm shift into the guts of the network might be better explained by a surgical assessment of the network core, its protocol structure, the devices, which deal with enrollment, classification, multiplexing/demultiplexing, flow control and routing but this will be a post for another day.

In the meantime the “network” has evolved into a first class citizen amongst infrastructure architects, software developers and consumers alike. No, I am not talking about the Social Network by big boy Zuck, but the fact that networks are finding them selves ingrained in almost anything not nailed down. This so called “Internet of Things” tells us that soon the network will be stitched into our lives through the air and into our clothes.

There are many arguments about the value of OpenFlow and SDN, but to find the benefits and use-cases the network domain experts may find the current toolsets and platforms a bit impenetrable. The current controller implementations are written in a combination of C, Python and Java and because of the “asynchronous” nature of the OF protocol, additional libraries have to be leveraged including Twisted and NIO which make it more difficult to understand exactly what is going on.

To that end I introduce NodeFlow, an OpenFlow controller written in pure JavaScript for Node.JS.  Node.JS provides an asynchronous library over JavaScript for server side programming which is perfect for writing network based applications (ones that don’t require an excessive amount of CPU).

NodeFlow is actually a very simple program and relies heavily on a protocol interpreter called OFLIB-NODE written by Zoltan LaJos Kis. I have a forked version of this library (see below) which have been tested with OpenFlow version 1.0.

Sidebar: A note on OpenFlow

Even though the Open Networking Forum has ratified the 1.2 protocol specification, we have yet to see a reference design which allows developers to experiment. In order to get a grasp of the programming model and data structures to this end I have concentrated on the most common implementation of OpenFlow 1.0. in OpenVSwitch.

Sidebar: Why Node.JS

Node.JS has become one of the most watched repos in GitHub and is headed up by the brilliant guys at Joyent. Anyone interested should check out Bryan Cantrill’s presentation  Building a Real-Time Cloud Analytics Service with Node.js

Setting up the development environment

Leveraging OpenVSwitch and tools such as MiniNet, anyone can create a simulated network environment within their own local machine. Instructions on how to setup the development environment can be seen here Download and Get Started with Mininet

Code review

We first setup the network server with a simple call to net.createServer, which we provide the port and address to listen on. The address and port are configured through a separate start script.

NodeFlowServer.prototype.start = function(address, port) {
var self = this

var socket = []
var server = net.createServer()

server.listen(port, address, function(err, result) {
util.log("NodeFlow Controller listening on " + address + ':' + port)
self.emit('started', { "Config": server.address() })
})

The next step provides the event listeners for socket maintenance, creates a unique sessionID from which we can keep track of each of the different switch connections and our main event process loop which is called every time we receive data on our socket channel. We use a stream library to buffer the data and return us the OpenFlow decoded message in the msgs object. We make a simple check on the message structure and then pass it on for further processing.


server.on('connection', function(socket) {
    socket.setNoDelay(noDelay = true)
    var sessionID = socket.remoteAddress + ":" + socket.remotePort
    sessions[sessionID] = new sessionKeeper(socket)
    util.log("Connection from : " + sessionID)

socket.on('data', function(data) {
    var msgs = switchStream.process(data);
    msgs.forEach(function(msg) {
    if (msg.hasOwnProperty('message')) {
         self._processMessage(msg, sessionID)
    } else {
         util.log('Error: Message is unparseable')
         console.dir(data)
   }
})

In the last section we leverage Node.JS EventEmitters to trigger our logic using anonymous callbacks. These event handlers wait for the specific event to happen and then trigger processing. We handle three specific events just for this initial release: ‘OFPT_PACKET_IN which is the main event to listen on for PACKET_IN events, and ‘SENDPACKET’ which simply encodes and sends our OF message on the wire.


self.on('OFPT_PACKET_IN', function(obj) {
 var packet = decode.decodeethernet(obj.message.body.data, 0)
 nfutils.do_l2_learning(obj, packet)
 self._forward_l2_packet(obj, packet)

})
 self.on('SENDPACKET', function(obj) {
 nfutils.sendPacket(obj.type, obj.packet.outmessage, obj.packet.sessionID)
 })

The “Hello World” of OpenFlow controllers simply provide a learning bridge function. Here below is the implementation, which is fundamentally a Python port of NOX Pyswitch.


do_l2_learning: function(obj, packet) {
 self = this

var dl_src = packet.shost
 var dl_dst = packet.dhost
 var in_port = obj.message.body.in_port
 var dpid = obj.dpid

if (dl_src == 'ff:ff:ff:ff:ff:ff') {
 return
 }

if (!l2table.hasOwnProperty(dpid)) {
 l2table[dpid] = new Object() //create object
 }
if (l2table[dpid].hasOwnProperty(dl_src)) {
 var dst = l2table[dpid][dl_src]
     if (dst != in_port) {
       util.log("MAC has moved from " + dst + " to " + in_port)
     } else {
          return
     }
} else {
     util.log("learned mac " + dl_src + " port : " + in_port)
     l2table[dpid][dl_src] = in_port
}
 if (debug) {
     console.dir(l2table)
 }

}

Alright, so seriously why the big deal.. There are other implementations which do the same thing, so why is NodeFlow so interesting. Well if we look at setting up a Flow Modification, which is what gets instantiated in the switch-forwarding table, you see we can see every element in JSON notation thanks to the OFLIB-NODE Library. This is very important as deciphering the TLV based protocol from a normative reference can be dizzying at best.


setFlowModPacket: function(obj, packet, in_port, out_port) {

var dl_dst = packet.dhost
var dl_src = packet.shost
var flow = self.extractFlow(packet)

flow.in_port = in_port

return {
 message: {
   version: 0x01,
     header: {
       type: 'OFPT_FLOW_MOD',
       xid: obj.message.header.xid
     },
     body: {
       command: 'OFPFC_ADD',
       hard_timeout: 0,
       idle_timeout: 100,
       priority: 0x8000,
       buffer_id: obj.message.body.buffer_id,
       out_port: 'OFPP_NONE',
       flags: ['OFPFF_SEND_FLOW_REM'],
       match: {
         header: {
         type: 'OFPMT_STANDARD'
         },
         body: {
           'wildcards': 0,
           'in_port': flow.in_port,
           'dl_src': flow.dl_src,
           'dl_dst': flow.dl_dst,
           'dl_vlan': flow.dl_vlan,
           'dl_vlan_pcp': flow.dl_vlan_pcp,
           'dl_type': flow.dl_type,
           'nw_proto': flow.nw_proto,
           'nw_src': flow.nw_src,
           'nw_dst': flow.nw_dst,
           'tp_src': flow.tp_src,
           'tp_dst': flow.tp_dst,
         },
       },
       actions: {
         header: {
           type: 'OFPAT_OUTPUT'
         },
         body: {
           port: out_port
         }
       }

    }
 }

Performance and Benchmarking

So I used Cbench to compare NOX vs. NodeFlow and here are the results.

NOX [./nox_core -i ptcp: pytutorial]

NOX c++ [./nox_core -i ptcp: switch]:

NodeFlow [running with Debug: False]:

C based Controller:

As you can see from the numbers NodeFlow can handle almost 2X what NOX can do and is much more deterministic. Maxing out at 4600 rsp/sec is not shabby on a VirtualBox VM on my Mac Air!

Summary

At just under 500 LOC this prototype implementation of an OF controller is orders of magnitude less than comparable systems. Leveraging JavaScript and the high performance V8 engine allows for network architects to experiment with various SDN features without the need to deal with all of the boilerplate code required for setting up event driven programming. Hope someone gets inspired by this and takes a closer look at Node.JS for network programming.

So how do I get NodeFlow?

NodeFlow is an experimental system available at GitHub: git://github.com/gaberger/NodeFLow.git along with my fork of the OFLIB-NODE libraries here: git://github.com/gaberger/oflib-node.git. If you would like to contribute or have any questions please contact me via Twitter @gbatcisco

Special thanks to Zoltan LaJos Kis for his great OFLIB-NODE library for which this work couldn’t have been done and Matthew Ranney for his network decoder library node-pcap.