Network Mechanics to Network Conductors

I get a general sense that the latest incarnation of network evolution (i.e. SDN) is becoming a way of expressing the frustration with dealing with a complex set of problems, which have yet to be solved. One of the things you have to ask yourself, as a network professional is “What do I really understand about the fundamentals of networking and how do I put that to use in the post-PC data hungry world?”

For years the best way to understand networking was to lug out your Network General Sniffer and watch the interaction of messages flowing across the screen. We had basic signals such as connection management; we had a general understanding of the traffic matrix by interrogating the network addresses, which we compared to our spreadsheets and some heuristics about flows. We leveraged the emerging SNMP standard to first collect traffic statistics into our pre-RRD datastores and presented pretty graphs of utilization to understand demand. Soon we had some expert systems, which would track the various protocol state transpiring between hosts and interpreting the results.

Scaling the data center meant learning about aggregation/distribution, the ratio of local traffic from remote. At the time most network engineers were taught the 80/20 rule i.e. 80% of the traffic stays local and only 20% is remote. This was a direct play on our centralized compute models, mainframes and the fact that most people were still using terminal based computing and sneakernet. It became the foundation of network design, which reflected this by oversubscribing capacity higher in the tree (i.e. Core, Distribution, Edge design).

Network automation was still in its infancy; you would use a floppy disk to update the firmware and operating system. Upgrading a Cisco router meant getting your terminal configured with the appropriate Xmodem/Zmodem settings and waiting hours while your data was serialized down a modem from the Cisco CCO BBS site.

Soon we were leveraging scripting languages like Expect and Perl to handle the complexity of managing network state across all the configuration files. Once you could use the SNMP private MIB to read and write a device configuration you could make global changes in an instant and repopulate the configurations across the world. In some ways this was all a step backwards from the advancing telecommunications control system present in the day, it was still a very closed and proprietary world leaving customers no choice but to adopt some complex and monolithic management applications.

So its 2012 and we are not much better at dealing with all of the challenges in running such a complex system as the network. IETF finally got its act together and delivered a more robust management framework through an application protocol called NetConf and an information modeling definition called YANG. Finally you can divorce the information model from the data transfer protocol and allow for a cleaner representation of the network configuration. But is this as far as we need to go? Why is SDN so interesting and what is it telling us about the still very complex problems with building, and operating networks?

As the title of the blog suggests, I think something can be said for the expertise required to manage complex systems. Question becomes, are you going to stay being a mechanic and worrying about some low-level details or are you going to be the pilot? Is it valuable to your employer for you to understand the low-level semantics of a specific implementation or rise above by creating proper interfaces to manipulate the state of the network through a reusable interface?

With information becoming more valuable than most commodities it will take a shift in mindset to move from low-level addressing concerns to traffic analysis, modeling and control. Understanding where the most important data is, how to connect to it and avoid interference will become much more important than understanding protocols.

So how does SDN contribute to this and how do we get from the complex set of tasks of setting up and operating networks to more of a fly-by-wire approach? How do we go from managing a huge set of dials and instruments to managing resources like a symphony?

The first thing to recognize is you can’t solve this problem in the network by itself!!. For years application developer’s expectations of the network were of infinite capacity and zero latency. They perceived that the flow-control capability in the network would suffice giving them ample room to pummel the network with data. Locality was far behind even an after-thought because they were developing on local machines unaware of the impact of crossing network boundaries. Networking guys use terms like latency, jitter, bandwidth, over-subscription, congestion, broadcast storms, flooding while application developer’s talk in terms of consistency, user experience, accuracy and availability.

The second thing to recognize is the network might need to be stripped down and built back up from scratch in order to further deal with its scaling challenges. In my eyes this is the clearest benefit to SDN as it highlights some of the major challenges in building and running networks. Experimenting with a complex system is disastrous; in order to break new ground it must be decomposed into its simplest form but certainly no simpler as Einstein would say. Its possible that OpenFlow has gone this route and must be redesigned into a workable set of primitive functions which can be leveraged not just through a centralized controller model but also to adapt new Operating Systems and protocols to leverage the hardware.

There is much debate over what the “best” model is here and what the objectives are. Since most networking is basically a “craft” and not a science there are those who strive to maintain the existing methodologies and mechanisms and simply open up a generalized interface to improve control. Others might see this as a mistake as if you reproduce the current broken layering model you are bound to run into a new set of challenges down the line which may require another patch, protocol or fix to solve.

Maybe an approach of looking back at the fundamentals of networking, what has been learned through the course of history, how other protocols behave and a reflective look at our industry would be valuable. How do you deal properly with connection management, data transfer efficiency, flow control? How do you leverage proper encapsulations and hierarchy to scale efficiently? What should management look like and how do you separate mechanism from policy and deliver hop-by-hop QOS?

Summary

In some regards the move towards Software Defined Network is an outcry of the frustration in managing an ever, complex set of interrelated components. Data centers have become huge information factories; servers themselves have become cluster of computers and our data hungry applications require massive amounts of parallel computing driving even more demand into the network. We could continue to take a ill-suited feature-driven approach to networking or we could take the opportunity to recognize what are the architectural principals to networking which would turn it from a craft to a science (not withstanding the argument on true science).