Emmergence of DataGrids to solve scaling problems

There is a great post at BigDataMatters discussing the emergence of Open Source Data Grids and the introduction of Infinispan 4.0.0 Beta 1.

The Infinispan site defines data grids as:

Data grids are highly concurrent distributed data structures. They typically allow you to address a large amount of memory and store data in a way that it is quick to access. They also tend to feature low latency retrieval, and maintain adequate copies across a network to provide resilience to server failure.

In the article Chris Wilk explains some of the challenges in data grid technologies around dynamic routing.

The reason that GigaSpaces suffers from this limitation is that it has a fixed space routing table at deployment time. The above scenario was described to Manik who said that Infinispan does not suffer from this restriction as it uses dynamic routing tables. Infinispan allows you to add any number of machines without incurring any down-time.

The spreading of data across many hosts is accomplished using different techniques but the point to take here is that altering the partition routing logic in mid-stream is very destructive to supporting distributed transactions. There are also many system level aspects which create inconsistencies including garbage collection and network overhead which could jeapordize the movement of dynamic objects between partitions.

Increasing the capacity of a data-grid to provide deterministic performance , robustness and consistency should be done by running a fixed amount of partitions and “moving” partition from one JVM to another newly started JVM. With GigaSpaces you can have 10 , 50 or 200 partitions used when starting the data-grid and have these running within a small amount of JVMs, later you can increase the amount when needed (manually or dynamically). You can re-balance the system and spread the partitions across all the existing JVMs. It is up to you to determine how far you want to scale the system which means you have total control on system behavior.

The routing mechanism with GigaSpaces will function without any problems and spread data across all  partitions as long as you have more unique keys than the amount of partitions. This should not be a problem with 99.99% of the cases.

The comparison ignores many other GigaSpaces features such as Mule integration , Event handling and data processing high-level building blocks , Web container and dynamic HTTP configuration , Service management , system management tools , performance (especially for single object operations , batch operations and local cache) , text search integration , massive amount of client support , large data support (up to several Tera data ) , large object support , Map-Reduce API , Scripting languages support (Java, .NET, C, Scala , Groovy…) , Cloud API support , schema evolution , etc….

Having new players is great and verifies that there is room for new vendors in this huge market for In-Memory-Data-Grid technologies on the cloud (private/public) – But it is important also to do the right comparison.

See more here:
http://www.gigaspaces.com/wiki/display/SBP/Capacity+Planning
http://www.gigaspaces.com/wiki/display/CCF/CCF4XAP+Documentation+Home