Курутин Н.С.

Национальный технический университет Украины «КПИ»

Alternatives to SQL Databases

Traditional SQL databases with "ACID" properties (Atomicity, Consistency, Isolation and Durability) give strong guarantees about what happens when data is stored and retrieved. These guarantees make it easier for application developers, freeing them from thinking about exactly how the data is stored and indexed, or even which database is running. However, these guarantees come with a cost.

Bob Ippolito presented a talk titled "Drop ACID and think about data" at PyCon 2009, which gave an overview of number of non-traditional databases. These alternatives compromise one or more of the ACID properties and expose the particulars of that data store's implementation in exchange for improved performance or scalability. Each also has its own limitations. This article will look at the more mature open source options Ippolito mentioned.

A number of companies have developed their own in-house data stores, including Amazon's Dynamo and Google's Bigtable. While none of the open source options are exactly like Dynamo or Bigtable, there are a number of high-performance, reliable and scalable options available.

Alternative Database Language Support

* means language support is pending

Cassandra:

C++, C#, Java, Perl, Python, PHP, Erlang, Ruby

Memcached:

C/C++, C#, Java, Perl, Python, PHP, Ruby, Lua, OCaml, Common LISP

Tokyo Cabinet:

C/C++, Java, Perl, Ruby, Lua

Redis:

C/C++, Java*, Perl, Python, PHP, Erlang, Ruby, Lua, Tcl

CouchDB:

C#, Java, Perl, Python, PHP, Erlang, Ruby, Haskell, JavaScript, Common LISP

MongoDB:

C++, Java, Python, PHP, Erlang*, Ruby

 

Cassandra is a data store written in Java that was open-sourced by Facebook and is now part of the Apache Incubator. Cassandra was originally designed to solve Facebook's in-box searching problem. Email reverse indexes were growing much faster than their databases could keep up with and they needed a affordable way to continue to grow.

Cassandra is designed to scale inexpensively with commodity networking equipment and servers, possibly in multiple data centers. Scalability and high availability are achieved by automatically "sharding" and replicating data across the servers and data centers.

A single Cassandra instance stores a single table, and each row is accessed with a key string. Every row of this table can have its own structure, storing a huge number of (key, value, time-stamp) tuples and/or nested columns. This makes Cassandra much more flexible than a simple key-value store, but not as general as a document database.

Although in heavy use by Facebook, Cassandra is early in development and still lacks some polish and documentation.

Perhaps the simplest key-value store is Memcached. Memcached is widely used to to speed up web applications by caching dynamic content. Part or all of the web pages are served from the cache instead of generating them at each request. Unlike in-process or shared memory caches, Memcached listens on a network socket and can be shared by many servers. Memcached may also be run on multiple servers and it will spread the keys across those servers and transparently fall back to servers that are still available when one goes down.

Memcached keys and values are always strings. In addition to storing, retrieving and deleting values, it allows atomic appending/prepending string data to stored values and addition to/subtraction from 64-bit integer values stored as decimal strings.

Memcached's data store is a fixed size and resides entirely in-memory. Data may be stored with an expiration time. Memcached will actively throw out data when the cache is full or when the data is set to expire.

For a key-value data store that won't throw out data, Tokyo Cabinet is a good choice. Like Berkeley DB, it uses either a hash table, B+ tree or a array of fixed-length records to store data on disk, but Tokyo Cabinet performs better and is thread safe. Tokyo Cabinet also promises to never corrupt data even in a "catastrophic situation". Tokyo Cabinet is actively maintained, and data stored is not limited by system RAM.

Tokyo Cabinet clients are separated into readers and writers. When a writer is accessing the database all other clients are blocked. This will result in poor performance for write-heavy workloads.

Tokyo Cabinet supports appending data to values stored. When using a B+ tree layout Tokyo Cabinet provides a cursor object to efficiently move forward and backward through the keys. B+ tree pages may also be compressed on disk with zlib or bzip2. Compressing data not only saves disk space, but can also increase performance on I/O-bound systems.

Redis is a disk-backed, in-memory key-value store with a number of additional features. Redis supports master-slave replication for redundancy, but not sharding, so all data must fit in a single system's RAM. Redis values may be binary strings, lists or sets. Redis provides atomic addition to/subtraction from integer values stored as decimal strings and push/pop/replacement of values in lists. The intersection of set values stored may also be calculated.

Redis can asynchronously save the database on request by forking the server process and writing out data in the background. The last successful save time may be queried to check when the changes have made it to disk. This design allows for good performance with the ability to save data when it makes sense for the particular application, but the application is responsible to make sure data is properly saved.

When using any key-value store as a cache care must be taken to invalidate values when the data changes or inconsistency will be introduced. Choosing a memory-only cache will be faster once it is populated, but there is a cost associated with filling an empty cache on restart. Key-Value stores are ideal for storing data that is not deeply nested and does not require complicated locking.

Document databases are designed to store large blocks of semi-structured data. The data is not restricted to a particular schema, so new versions of data can be stored alongside old versions without the need for migrations. Documents can be very large and deeply nested.

The best data store for an application depends in large part on how deeply nested the values stored will be. If the application needs to only store strings and integers then a simple key-value store like Memcached, Tokyo Cabinet or Redis would be best. If the values can be represented as lists and sets of simple values then Tokyo Cabinet, Redis or Cassandra would be good options. If the application needs nested lists and hashes then choose Cassandra, CouchDB or MongoDB. Finally, if the values contain deeply nested data then only a document database like CouchDB or MongoDB will do.

Once a data store has been chosen and and the application optimized for it, switching to a completely different API will not be easy. It is worth investing time evaluating the remaining options by writing code to simulate the application's usage patterns before making a choice.

Citation:

1.     http://us.pycon.org/2009/conference/schedule/event/64/

2.     http://incubator.apache.org/cassandra/

3.     http://www.danga.com/memcached/

4.     http://tokyocabinet.sourceforge.net/index.html

5.     http://code.google.com/p/redis/