Курутин Н.С.
Национальный технический
университет Украины «КПИ»
Alternatives to SQL Databases
Traditional SQL databases with "ACID" properties (Atomicity,
Consistency, Isolation and Durability) give strong guarantees about what
happens when data is stored and retrieved. These guarantees make it easier for
application developers, freeing them from thinking about exactly how the data
is stored and indexed, or even which database is running. However, these
guarantees come with a cost.
Bob Ippolito presented a talk titled "Drop ACID and think about data" at PyCon 2009, which gave an overview of number of
non-traditional databases. These alternatives compromise one or more of the
ACID properties and expose the particulars of that data store's implementation
in exchange for improved performance or scalability. Each also has its own
limitations. This article will look at the more mature open source options
Ippolito mentioned.
A number of companies have developed their own in-house data stores,
including Amazon's Dynamo and Google's Bigtable. While none of the open source
options are exactly like Dynamo or Bigtable, there are a number of
high-performance, reliable and scalable options available.
Alternative Database
Language Support |
|
*
means language support is pending |
|
Cassandra: |
C++,
C#, Java, Perl, Python, PHP, Erlang, Ruby |
Memcached: |
C/C++,
C#, Java, Perl, Python, PHP, Ruby, Lua, OCaml, Common LISP |
Tokyo Cabinet: |
C/C++,
Java, Perl, Ruby, Lua |
Redis: |
C/C++,
Java*, Perl, Python, PHP, Erlang, Ruby, Lua, Tcl |
CouchDB: |
C#,
Java, Perl, Python, PHP, Erlang, Ruby, Haskell, JavaScript, Common LISP |
MongoDB: |
C++,
Java, Python, PHP, Erlang*, Ruby |
Cassandra
is a data store written in Java that was open-sourced by Facebook and is now
part of the Apache Incubator. Cassandra was originally designed to solve
Facebook's in-box searching problem. Email reverse indexes were growing much
faster than their databases could keep up with and they needed a affordable way
to continue to grow.
Cassandra is designed to scale inexpensively with commodity networking
equipment and servers, possibly in multiple data centers. Scalability and high
availability are achieved by automatically "sharding" and replicating
data across the servers and data centers.
A single Cassandra instance stores a single table, and each row is
accessed with a key string. Every row of this table can have its own structure,
storing a huge number of (key, value, time-stamp) tuples and/or nested columns.
This makes Cassandra much more flexible than a simple key-value store, but not
as general as a document database.
Although in heavy use by Facebook, Cassandra is early in development and
still lacks some polish and documentation.
Perhaps the simplest key-value store is Memcached.
Memcached is widely used to to speed up web applications by caching dynamic content.
Part or all of the web pages are served from the cache instead of generating
them at each request. Unlike in-process or shared memory caches, Memcached
listens on a network socket and can be shared by many servers. Memcached may
also be run on multiple servers and it will spread the keys across those
servers and transparently fall back to servers that are still available when
one goes down.
Memcached keys and values are always strings. In addition to storing,
retrieving and deleting values, it allows atomic appending/prepending string
data to stored values and addition to/subtraction from 64-bit integer values
stored as decimal strings.
Memcached's data store is a fixed size and resides entirely in-memory.
Data may be stored with an expiration time. Memcached will actively throw out
data when the cache is full or when the data is set to expire.
For a key-value data store that won't throw out data, Tokyo Cabinet
is a good choice. Like Berkeley DB, it uses either a hash table, B+ tree or a
array of fixed-length records to store data on disk, but Tokyo Cabinet performs
better and is thread safe. Tokyo Cabinet also promises to never corrupt data
even in a "catastrophic situation". Tokyo Cabinet is actively
maintained, and data stored is not limited by system RAM.
Tokyo Cabinet clients are separated into readers and writers. When a
writer is accessing the database all other clients are blocked. This will
result in poor performance for write-heavy workloads.
Tokyo Cabinet supports appending data to values stored. When using a B+
tree layout Tokyo Cabinet provides a cursor object to efficiently move forward
and backward through the keys. B+ tree pages may also be compressed on disk
with zlib or bzip2. Compressing data not only saves disk space, but can also
increase performance on I/O-bound systems.
Redis is a disk-backed, in-memory key-value store with a number of additional
features. Redis supports master-slave replication for redundancy, but not
sharding, so all data must fit in a single system's RAM. Redis values may be
binary strings, lists or sets. Redis provides atomic addition to/subtraction
from integer values stored as decimal strings and push/pop/replacement of
values in lists. The intersection of set values stored may also be calculated.
Redis can asynchronously save the database on request by forking the
server process and writing out data in the background. The last successful save
time may be queried to check when the changes have made it to disk. This design
allows for good performance with the ability to save data when it makes sense
for the particular application, but the application is responsible to make sure
data is properly saved.
When using any key-value store as a cache care must be taken to
invalidate values when the data changes or inconsistency will be introduced.
Choosing a memory-only cache will be faster once it is populated, but there is
a cost associated with filling an empty cache on restart. Key-Value stores are
ideal for storing data that is not deeply nested and does not require
complicated locking.
Document databases are designed to store large blocks of semi-structured
data. The data is not restricted to a particular schema, so new versions of
data can be stored alongside old versions without the need for migrations.
Documents can be very large and deeply nested.
The best data store for an application depends in large part on how
deeply nested the values stored will be. If the application needs to only store
strings and integers then a simple key-value store like Memcached, Tokyo
Cabinet or Redis would be best. If the values can be represented as lists and
sets of simple values then Tokyo Cabinet, Redis or Cassandra would be good
options. If the application needs nested lists and hashes then choose
Cassandra, CouchDB or MongoDB. Finally, if the values contain deeply nested
data then only a document database like CouchDB or MongoDB will do.
Once a data store has been chosen and and the application optimized for
it, switching to a completely different API will not be easy. It is worth
investing time evaluating the remaining options by writing code to simulate the
application's usage patterns before making a choice.
Citation:
1. http://us.pycon.org/2009/conference/schedule/event/64/
2. http://incubator.apache.org/cassandra/
3. http://www.danga.com/memcached/
4. http://tokyocabinet.sourceforge.net/index.html
5. http://code.google.com/p/redis/