Cloud Networking Hyper or Reality?

A colleague of mine pointed out a new post by Jayshree Ullal from Arista Networks on Cloud Networking Reflections. I can’t help to comment on a few things for my own sanity.

Prediction #1: The rise in dense virtualization is pushing the scale of cloud networking.

Evaluation #1: True

IT is very “trend” oriented, meaning sometimes the complexity of operating a distributed system are people are too busy look deep into the problem for themselves and instead lean on the communities of marketing wizards to make a decision for them. Despite VMWare’s success, hardware virtualization makes up a very small part of the worldwide server base, which is estimated at around 32M servers [1]. I predict within a few short years a reversal in this trend, which peaked around 2008 for several reasons.

  • One is the realization that the “hardware virtualization” tax grows increasingly with I/O, a very significant problem as we move into the era of “Big Data”. The reality is as we move to more interactive and social driven applications the OS container is not as crucial as it is in a generalized client/server model. Application developers need to continuously deal with higher degrees of scalability, application flexibility, improved reliability, and faster development cycles. Using techniques like Lean software development and Continuous Delivery, application developers can get a Minimal Viable Product out the door in weeks sometimes days.
  • Two, the age of  “Many Task Computing” is upon us and will eventually sweep away the brain-dead apps and the entire overhead that comes with supporting multiple thick-containers. I say lets get down with LXC or better yet Illumos Zones, which gives us the namespace isolation without the SYSCALL overhead.
  • Three, heterogeneous computing is crucial for interactive and engaging applications. Virtualization hides this at the wrong level; we need the programming abstractions such as OpenCL/WebCL for dealing with specialization in vector programming and floating-point support via GPU’s. Even micro-servers will have a role to play here allowing a much finer grain of control while still improving power efficiency.

Its not “dense virtualization” pushing the scale of cloud networking, it is the changing patterns of the way applications are built and used. This will unfortunately continue to change the landscape of both systems design as well as network.

My Advice: Designers will finally wake up and stop being forced into this “hyper-virtualized” compute arbitrage soup and engineer application services to exploit heterogeneous computing instead of being constrained by a primitive and unnecessary abstraction layer. In the mean time, ask your developers to spend the time to build scalable platform services with proper interfaces to durable and volatile storage, memory and compute. In this way you isolate yourself from specific implementations removing the burden of supporting these runaway applications.

Prediction #2: “Fabric” has become the marketing buzzword in switching architectures from vendors trying to distinguish themselves.

Evaluation #2: Half-True.

I think the point of having “specialized” fabrics is a side effect of the scalability limits of 1990’s based network design, protocols and interconnect strategies. Specialized and proprietary fabrics have been around for years, Think Machines, Cray, SGI and Alpha all needed to deal with scalability limits connecting memory and compute together. Today’s data centers are an extension to this and have become modern super-computers connected together (i.e. a fabric)

Generally the current constraints and capabilities of technology today have forced a “rethink” on how to optimize network design for a different set of problems. There is nothing terribly shocking here unless you believe that current approaches are satisfactory. If the current architectures are satisfactory, why do we have so much confusion on whether to use L2 multi-pathing or L3 ECMP? Why is there not ONE methodology for scaling networks? Well I’ll tell you if you haven’t figured it out. Its because the current set of technologies ARE constrained and lack the capabilities necessary for truly building properly designed networks for future workloads.

The beauty of Arista’s approach is we can scale and manage two to three times better with standards. I fail to understand the need for vendor-specific proprietary tags for active multipathing when standards-based MLAG at Layer 2 or ECMP at Layer 3 (and future TRILL) resolves the challenges of scale in cloud networks. 

Scale 2x to 3x better with standards? How about 10x or better yet 50x? Really 2-3x improvement in anything is statistically insignificant and you are still left with corner cases, which absolutely grind your business to a halt. Pointing out MLAG is better than TRILL or SPB or ECMP is better than whatever is not the point. I mean really, how many tags do we need in a frame anyway and what the hell with VXLAN and NVGRE? Additional data-plane bits are not the answer, we need to rethink the layering model, address architecture and error and flow control mechanisms.

There is no solution unless you break down the problem, layer by layer until you remove all of the elements down to just the invariants. Its possible that is the direction of OpenFlow/SDN, the only problem maybe that completely destroys the layers entirely but maybe that’s the only way to build them back up the right way.

BTW. There is nothing really special about saying “standards”, after all TCP/IP itself was a rogue entry in the standards work (INWG 96) so its another accidental architecture that happened to work.. for a time!

My Advice: For those who have complete and utter autonomy, treat the DC as a giant computer which should be designed to meet the goals of your business within the capabilities and constraints of todays technology. Once you figure it out, you can use the same techniques in software to OpenSource your innovation making it generally feasible for others to enter the market (if you care about supply chain). For those who don’t, ask your vendors and standards bodies why they can’t deliver a single architecture which doesn’t continuously violate the invariances by adding tags, encaps, bits, etc..

Prediction #4: Commercially available silicon offers significant power, performance and scale benefits over traditional ASIC designs.

Evaluation #4: Very true.

Yea no surprise here, but its not as simple as just picking a chip off the shelf. When designing something as complex as an ASIC, you have to make certain tradeoffs. Feature sets build up over time, and it takes time to move back to a leaner model of primitive services with exceptional performance. There is no difference between an ASIC designer working for a fabless semiconductor company spinning out wafers from TSMC and a home grown approach, it is in the details of the design and implementation with all of the sacrifices one makes when choosing how to allocate resources.

My Advice: Don’t make decisions based on who makes the ASIC but what can be leveraged to build a balanced and flexible system. The reality is there is more to uncover than just building ASIC’s, for instance how about a simpler data plane model which would allow us to create cheaper and higher performance ASIC’s?

Prediction # 5: FCoE adoption has been slow and not lived up to its marketing hype.

Evaluation # 5: True.

“A key criterion for using 10GbE in storage applications is having switches with adequate packet buffering that can cope with speed mismatches to avoid packet loss and balance performance, “

This is also misleading as it compares FCOE with FC with 10GE sales as a way of dismissing a viable technology. But the reality is that the workload pattern changed moving the focus from interconnect to interface.

From an application development point of view, interfacing with storage at a LUN or “block” level is incredibly limited. It’s simply just not the right level of abstraction, which is why we started to move to NAS, or “file” based approaches and even converging the reemergence of content based and distributed object stores.

Believe me, developers don’t give a care if there is an FC backend or FCOE, it is irrelevant, the issue is performance. When you have a SAN based system you are dealing with a system balanced for dealing with different patterns of data access, reliability and coherency. This might be exactly what you don’t want, you may be very write intensive or read intensive and require a different set of properties than current SAN arrays provide.

The point about adding buffering to the equation not only makes things worse, but also increases the cost of the network substantially. Firstly the queues can build up very quickly especially at higher clock speeds and the impact on TCP flow-control is a serious issue. I am sure the story is not over and we will see different ways of dealing with this problem in the future. You might want to look a little closer at FC protocols and see if you can see any familiarity with TRILL.

My Advice: Forget the hype of Hadoop and concentrate on isolating the workload patterns that impact your traffic matrix. Concentrate on what the expectations of the protocols are, how to handle error and flow control, mobility, isolation, security and addressing. Develop a fundamental understanding of how to impart fair scheduling in your system to deal with demand floods, partitioning events and chaotic events. Turns out a proper “load shedding” capability can go along way in sustaining system integrity.

Yes I know, thats a lot of opaque nonsense, and while many advantages exist for businesses which choose to utilize the classical models, there are still many problems in dealing with the accidental architecture of todays networks. The future is not about what we know today, but what we can discover and learn from our mistakes once you realize we made them.

While I do work at Cisco Systems as a Technical Leader in the DC Group, these thoughts are my own and don’t necessarily represent those of my employer.