Elastic Scaling

For years, database administrators have relied on scale up -- buying bigger servers as database load increases -- rather than scale out -- distributing the database across multiple hosts as load increases. However, as transaction rates and availability requirements increase, and as databases move into the cloud or onto virtualized environments, the economic advantages of scaling out on commodity hardware become irresistible.
RDBMS might not scale out easily on commodity clusters, but the new breed of NoSQL databases are designed to expand transparently to take advantage of new nodes, and they are usually designed with low-cost commodity hardware in mind.

Big Data

Just as transaction rates have grown out of recognition over the last decade, the volumes of data that are being stored also have increased massively. O'Reilly has cleverly called this the "industrial revolution of data." RDBMS capacity has been growing to match these increases, but as with transaction rates, the constraints of data volumes that can be practically managed by a single RDBMS are becoming intolerable for some enterprises. Today, the volumes of "big data" that can be handled by NoSQL systems, such as Hadoop, outstrip what can be handled by the biggest RDBMS.

Goodbye DBAs (not needed)

Despite the many manageability improvements claimed by RDBMS vendors over the years, high-end RDBMS systems can be maintained only with the assistance of expensive, highly trained DBAs. DBAs are intimately involved in the design, installation, and ongoing tuning of high-end RDBMS systems. NoSQL databases are generally designed from the ground up to require less management: automatic repair, data distribution, and simpler data models lead to lower administration and tuning requirements -- in theory. In practice however, it is likely that rumors of the DBA's demise have been slightly exaggerated. Someone will always be accountable for the performance and availability of any mission-critical data store.


NoSQL databases typically use clusters of cheap commodity servers to manage the exploding data and transaction volumes, while RDBMS tends to rely on expensive proprietary servers and storage systems. The result is that the cost per gigabyte or transaction/second for NoSQL can be many times less than the cost for RDBMS, allowing you to store and process more data at a much lower price point.

Flexible Data Models

Change management is a big headache for large production RDBMSs. Even minor changes to the data model of an RDBMS have to be carefully managed and may necessitate downtime or reduced service levels. NoSQL databases have far more relaxed -- or even nonexistent -- data model restrictions. NoSQL Key Value stores and document databases allow the application to store virtually any structure it wants in a data element. Even the more rigidly defined BigTable-based NoSQL databases (Cassandra, HBase) typically allow new columns to be created without too much fuss.

Cassandra (2.0), CQL (Cassandra Query Language)

  • CQL3 is very similar to SQL, but with some limitations that arise from its scalability (most notably: no JOINs and no Aggregate Functions)
  • CQL3 is now the official interface. One does not need Thrift, unless one is working on a legacy App. Hence one does not waste time trying to understand ColumnFamilies, SuperColumns, etc.
  • Querying by key, or key range (secondary indices are also available)
  • Tunable trade-offs for distribution and replication (N, R, W)
  • Data can have expiration (set on INSERT).
  • Writes can be much faster than reads (when reads are disk-bound) – extremely useful feature when data-collection is constantly happening.
  • Map/reduce possible with Apache Hadoop
  • All nodes are similar, as opposed to Hadoop/HBase
  • Very good and reliable cross-datacenter replication
  • Distributed counter datatype.
  • You can write triggers in Java.

Best Used : When you need to store data so huge that it may not fit on a Server but still want a friendly, familiar interface to it. When you want to run Map/Reduce to the data.

Some Examples: Web analytics to count hits by hour, by browser, by IP etc., or Transaction logging, Data Collection from huge Sensor Arrays.

MongoDB (2.2):

  • Master/slave replication (auto failover with replica sets)
  • Sharding built-in
  • Queries are Javascript expressions
  • Run arbitrary Javascript functions on the server-side
  • Better update-in-place than CouchDB (another DB system not related to MongoDB)
  • Uses memory mapped files for data storage
  • Performance over features
  • Journaling (with --journal) is best turned on
  • On 32bit systems, limited to ~2.5Gb
  • An empty database takes up 192Mb
  • GridFS to store big data + metadata (not actually an FS)
  • Has Geospatial indexing (comes in handy while building location-dependent Apps)
  • Data center awareness

Best Used: For good performance with High Write Load (for example: writing millions of transactions), High Availability (the replication and recovery from a Node is instantaneous, safe and automatic), Scalability (built-in Sharding), Location-dependent data (built-in Geospatial Functions), Schema-less design for growing data.

Some Examples: One could use MongoDB for most projects that could be done with MySQL or PostgresSQL without being limited by predefined Columns.