To the land of plenty.. Moving towards high-performance cluster management

blogging1

“Jevons Paradox” is the proposition that as technology progresses (invention->innovation->diffusion), the increase in efficiency with which a resource is used tends to increase the rate of consumption of that resource.

As much as cloud operators continue to increase the population of hardware assets, it has become an increasingly difficult problem to efficiently utilize those resources effectively as demand grows. This has huge implications in the longevity of these multi-million dollar cloud warehouses highlighting the need to make better decisions on resource allocation and assignment.

Into the light..

Some promising work comes from Christine Delimitrou described in her paper “Quasar: Resource-Efficient and QoS-Aware Cluster Management

Quasar is a follow-up to the work on Paragon, a system to leverage collaborative filtering to characterize (classify) applications in terms of heterogeneity and potential for interference. Quasar establishes a set of interfaces which expand upon Paragons’ classifier.  These interfaces allow for choices to be made in scaling such as the amount of resources per server or the amount of servers per workload. Both Paragon and Quasar use offline sampling (profiling) instead of relying on some explicit characteristics but Quasar goes further in applying jointly handling resource allocation and assignment. Quasar is part of a broader set of cluster management platforms such as Omega, Borg, Mesos, which are being used in production in some of the largest web properties on the Internet.

Quasar exports a high level interfere to meet different performance constraints such as:

  • Latency critical workloads use a combination of QPS (Queries Per Second) and latency
  • Distributed frameworks use execution time
  • Single and Multi-threaded applications can use IPS (Instructions Per Second)

This work has a lot of promise given the increasing demand for efficient allocation of infrastructure resources. There continues to be an iterative cycle between application developers and infrastructure teams to mitigate the risk of failure while increasing utilization. But how does one decide which variables and how many must be used to decide on which resources to assign?

Large shops like Facebook,Twitter and Google have been experimenting with cluster scheduling for years. Systems like Omega grew out of the complexity of managing flexible scheduling with ever increasing linear complexity spawned from their explosive growth. As reported in the Quasar paper, sophisticated frameworks like Borg and Mesos have a hard time driving more than 20% aggregate CPU utilization and can under estimate resource reservations by 5x and over estimate reservations by as much as 10x. Its important to note that these numbers are at the high-end with a majority of cloud data centers and enterprise customers experiencing only a fraction of the available capacity they have invested in.

As can be seen by the following graphic, Not only are jobs completing faster with the Quasar scheduler but CPU utilization is increasingly higher which could increase the usefulness of a data center by several years having dramatic cost savings for the large web-scale data centers.

blogging2

It is no secret in todays “application centric economy” that huge benefits can be obtained through application/infrastructure cooperation. Chip designs have followed the path of adding transistors to deal with complex problems such as matrix multiplication, stream processing, virtualization and high-speed I/O. infrastructure vendors have started to focus on the shifting operational models which have manifested in areas such as cloud computing, DevOps, Network Virtualization and Software Defined Networking.

The allocation and assignment of resources becomes a critical decision point which must be reacted to not in human scale but in machine scale.. The dominant force here centers around “Reactive Design” and the need for operational stability.

But who is responsible for coordinating resources, resolving shared resource conflicts in a highly dynamic environment?

Send in the Conductor.. blogging3

Orchestration describes the automated arrangement, coordination, and management of complex computer systems, middleware, and services that are used to align business or operational request with applications, data and infrastructure within a management domain [ref].

Orchestration can be broken into roughly 9 categories including: Allocation, Assignment, Scheduling, Visualization, Monitoring, Modeling,  Discovery, Packaging and Deployment.

These become fundamental building blocks for building distributed systems and allows us to talk about these functions with a clear set of vocabulary.

Allocation: Determining the appropriate resources to satisfy the performance objective at the lowest cost

Assignment: The process of selecting the appropriate resources which satisfy the resource allocation

Scheduling: Enables an allocated resource to be configured automatically for application use, manages the resource(s) for the duration of the task to secure compliance and restores the resource to its original state for future use

Visualization: The process of rendering information related to service availability, performance and security

Monitoring: Provides visibility into the state of resources and notifies applications and infrastructure management serves of changes in state

Discovery: The realization of a resource or service through observation, active probing or enrollment

Modeling: Describes available resources and their capabilities, dependencies, behaviors and relationships as a policy. Can also be used to describe composition of resources and services (i.e. happens-before relationships)

Packaging: The process of collecting all artifacts and dependancies into a portable container which can be transferred across resources. This packaging might also encapsulate existing state for instance in live migrations.

Deployment: Code and data need to be instantiated into a system in order for the scheduler to reserve resources. Delivering the packages mentioned above across resources requires coordination as to not overwhelm the network during updates.

When driving for high-performance for customers and high-efficiency for operators resource allocation and assignment become critical decision processes in the orchestration system. Quasar provides an interface which can directly relate to emerging Promise Theory allowing developers to declare scalability policies which express performance constraints allowing Quasar to search through the available option space to best fit the constraints with the available resources..

But what about the network?

“Your network is in my way..”

blogging4

Everyone in the network industry is aware of James Hamilton’s observation that network technologies have long become inefficient and overly complex. SDN has driven this conversation to the forefront challenging foundational principals of the Internet such as decentralization and the end-to-end principal. The current protocol stack has a number of problems known as far back as the initial ARPAnet designs over 40 yrs ago. The Internet has become more  complex due to the distributed nature of application design and the need for location independance.

When it comes to network interference we have different opportunities to optimize for resource constraints including:

  • Path selection – Optimized to minimize distance (propagation delay)
  • Congestion and Flow Control – Optimized to maximize bandwidth
  • Error Control – Optimized to minimize loss
  • Scheduling – Optimized to maximize queue fairness amongst competing flows

This would seem to be plenty to deal with network interference except for the problem that not all flows are necessarily equal. For instance a trading application might need market data to take priority over backup replication. VOIP traffic needs to be prioritized over streaming downloads.

Unfortunately as much as we would like to have a way to map priorities across the network, the current environment makes it difficult to achieve in practice. This usually falls within the purview of Traffic Engineering and incorporates different methods for describing, distributing and acting upon flow policies either for admission control or filtering.

In a recent GIGAOM survey operators categorized Network Optimization as the leading use-case for SDN, NFV and OpenSource which might be another way of saying that we need a facility to characterize inter-process communication in a way which can be fed back into our orchestration systems to make proper resource allocation and assignment decisions.

blogging5

As the industry moves through technological change (S-Curve), a rapid innovation cycle will result in many failures until we reach the point of wide adoption (diffusion). Many have speculated on the timelines but it is still far from proven how well customers will adopt not only the change that comes from technology but also the change in organizational structure, skill sets and policy.