Category Archives: Platform as a Service

Cisco UCS “Cloud In A Box”: Terabyte Processing In RealTime

Now I hate using the term “Cloud” for anything these days but in the latest blog entry from Shay Hassidim, Deputy CTO of Gigaspaces Terabyte Elastic Cache clusters on Cisco UCS and Amazon EC2 the Cisco UCS 260 took the place of 16 Amazon High-Memory Quadruple Extra Large Instance. With 16:1 scaling imagine what you can do with a rack of these, in other words forget about Hadoop, lets go real-time data grid enabled networking!

With 40 Intel cores and 1TB of memory available to Gigaspaces XAP high performance In Memory Data Grid the system achieved an astounding 500,000 Ops/sec on 1024B POJO, the system could load 1 Billion objects in just under 36 minutes.

Now this might not sound extraordinary, but when you consider how to build an application where the bottleneck on a 40 core, 1TB system is CPU and Memory bound, properly deal with failures and have automation and instrumentation, you can’t beat this kind of system. Gigaspaces is also integrated into Cisco UCS XML-API for dynamic scaling of hardware resources.

Eventually people will catch on that memory is critical for dealing with “Big Data” and it’s no longer an issue of reliability or cost. Without disk rotational latency in the way and poor random access we can push the limits of our compute assets while leveraging the network for scale. Eventually we might see a fusion of in-memory data grids with network in a way, which allows us to deal with permutation traffic patterns by changing the dynamics of networking, processing and storage.

Emmergence of DataGrids to solve scaling problems

There is a great post at BigDataMatters discussing the emergence of Open Source Data Grids and the introduction of Infinispan 4.0.0 Beta 1.

The Infinispan site defines data grids as:

Data grids are highly concurrent distributed data structures. They typically allow you to address a large amount of memory and store data in a way that it is quick to access. They also tend to feature low latency retrieval, and maintain adequate copies across a network to provide resilience to server failure.

In the article Chris Wilk explains some of the challenges in data grid technologies around dynamic routing.

The reason that GigaSpaces suffers from this limitation is that it has a fixed space routing table at deployment time. The above scenario was described to Manik who said that Infinispan does not suffer from this restriction as it uses dynamic routing tables. Infinispan allows you to add any number of machines without incurring any down-time.

The spreading of data across many hosts is accomplished using different techniques but the point to take here is that altering the partition routing logic in mid-stream is very destructive to supporting distributed transactions. There are also many system level aspects which create inconsistencies including garbage collection and network overhead which could jeapordize the movement of dynamic objects between partitions.

Increasing the capacity of a data-grid to provide deterministic performance , robustness and consistency should be done by running a fixed amount of partitions and “moving” partition from one JVM to another newly started JVM. With GigaSpaces you can have 10 , 50 or 200 partitions used when starting the data-grid and have these running within a small amount of JVMs, later you can increase the amount when needed (manually or dynamically). You can re-balance the system and spread the partitions across all the existing JVMs. It is up to you to determine how far you want to scale the system which means you have total control on system behavior.

The routing mechanism with GigaSpaces will function without any problems and spread data across all  partitions as long as you have more unique keys than the amount of partitions. This should not be a problem with 99.99% of the cases.

The comparison ignores many other GigaSpaces features such as Mule integration , Event handling and data processing high-level building blocks , Web container and dynamic HTTP configuration , Service management , system management tools , performance (especially for single object operations , batch operations and local cache) , text search integration , massive amount of client support , large data support (up to several Tera data ) , large object support , Map-Reduce API , Scripting languages support (Java, .NET, C, Scala , Groovy…) , Cloud API support , schema evolution , etc….

Having new players is great and verifies that there is room for new vendors in this huge market for In-Memory-Data-Grid technologies on the cloud (private/public) – But it is important also to do the right comparison.

See more here:
http://www.gigaspaces.com/wiki/display/SBP/Capacity+Planning
http://www.gigaspaces.com/wiki/display/CCF/CCF4XAP+Documentation+Home

What Should Be VMWares Next Move

I wanted to point out an interesting article posted here on CIO.com.

Here is an excerpt,

“The most glaring omission [in VMware’s portfolio] is [the] need for Java object distributed caching to provide yet another alternative to scalability,” Ovum analyst Tony Baer said in a post to his personal blog on Tuesday. “If you only rely on spinning out more [virtual machines], you get a highly rigid, one-dimensional cloud that will not provide the economies of scale and flexibility that clouds are supposed to provide. So we wouldn’t be surprised if GigaSpaces or Terracotta might be next in VMware’s acquisition plans.”

Now I couldn’t be more happy that someone besides myself recognizes that in order for services to be uncoupled from the persistence layer you must have a distributed caching system. There are several players not all created equal but all with value in this field. They include  Gigaspaces, Terracotta, Oracle (Tangasol) Coherence and Gemstone.

Distributed caching is nothing new and most of the large internet companies like FaceBook, Twitter etc are utilzing open source tools like memcache to get a very rudimentry distributed cache.

Gartner analyst Massimo Pezini is right on with his comment “I think one of the reasons why VMware is buying SpringSource is to be able to move up the food chain and sell cloud-enabled application infrastructure on top of their virtualization infrastructure,” Pezzini said. “It wouldn’t take much to make it possible to deploy Spring on top of the bare VMware — i.e., with no Linux or Windows in the middle

If VMWARE changes focus onto the JAVA stack they can be well on their way to building a complete service virtualization platform.

The JAVA platform has an opportunity to sit on the bare metal and provide a ubiquitous abstraction layer between the infrastructure and the application stack. If we look at Oracle JRocket, IBM Libra and Sun Maxine there is already much research in a baremetal JVM. Sun has also been working on a pure JAVA OS called Guest VM which eliminates Windows and Linux from the guest altogether and is wriiten in pure JAVA.

The realization that instance scaling (Virtual Machine Proliferation) which requires moving the complete server state from machine to machine is a very difficult and a dirty process. If we have abstracted the underlying operating system as a pure JAVA runtime we can migrate our JAVA applications very simply in fact it is the main usecase I demonstrated in my multi-part series which utiizes Gigaspaces as an In-Memory Data Grid.

Part 2: Using Groovy, Grails and Gigaspaces “3G”

Part 2: Utilize a dynamic language, one that really anyone can learn.

I chose to use Groovy and Grails for this project.. why?

Because of Groovys natural support for the JAVA language anyone with a background in the language can be productive. In fact since Groovy is a dynamic language which means it supports first class functions, closures, etc… It saves a lot of the developers time. Groovy is not a statically typed language which means you don’t have to declare the “storage” type before you use the variable.. You can actually recast a variable depending on the problem you are working on making the langage very fluid. Groovy supports about 98% (i think) of native JAVA

Groovy benefits (http://groovy.codehaus.org/)

  • Is an agile and dynamic language for the Java Virtual Machine
  • Builds upon the strengths of Java but has additional power features inspired by languages like Python, Ruby and Smalltalk
  • Makes modern programming features available to Java developers with almost-zero learning curve
  • Supports Domain-Specific Languages and other compact syntax so your code becomes easy to read and maintain
  • Makes writing shell and build scripts easy with its powerful processing primitives, OO abilities and an Ant DSL
  • Increases developer productivity by reducing scaffolding code when developing web, GUI, database or console applications
  • Simplifies testing by supporting unit testing and mocking out-of-the-box
  • Seamlessly integrates with all existing Java objects and libraries
  • Compiles straight to Java bytecode so you can use it anywhere you can use Java

Grails (http://www.springsource.com/products/grails) is an advanced and innovative open source web application platform that delivers new levels of developer productivity by applying principles like Convention over Configuration. Grails helps development teams embrace agile methodologies, deliver quality applications in reduced amounts of time, and focus on what really matters: creating high quality, easy to use applications that delight users.

Grails is built around SpringMVC. Groovy and GraIls behave like another web framework and dynamic language called Ruby on Rails…
Continue reading Part 2: Using Groovy, Grails and Gigaspaces “3G”