We are heavy users of MySQL (Percona) and MongoDB at my work. Recently I have been researching DynamoDB because of a specific use-case. A side project I run uses Google App Engine with Datastore (aka bigtable) for persistence.
Comparing DynamoDB with MongoDB is like comparing apples and oranges. The only thing the two share in common really is the fact that neither supports SQL (and for that reason are called NoSQL databases). Their intended purpose is completely different which is why I found it strange that the author of the original Slashdot story would pit them against each other the way he did.
If DynamoDB is to be compared against another datastore, the most similar alternative would probably be Google App Engine's Datastore/big table.
Similarities between DynamoDB and GAE Datastore
- both use "schema-less" table structures for storing items (i.e. two items in a single table can have different columns)
- both support relatively simple primary keys (GAE only allows a single column PK, Dynamo allows a pseudo-two-column PK)
- both encourage only efficient queries (GAE forces it, Dynamo allows full table scans but they are highly discouraged)
- both support list properties (a column with multiple string values for example)
- both are hosted "in the cloud" and scale horizontally almost infinitely
- both are billed based on reads/writes + total stored data (Dynamo has an extra dimension to cost which is throughput)
- both have very limited support for referential integrity between items (GAE supports "embedded" entities and recently added basic relationships but nothing like many to many)
- GAE supports transactions across entities within the same group (i.e. on the same server) and recently added support for XA transactions (tx's across entities in different groups/on different servers). Dynamo does not have transactions but it supports some atomic operations on an individual item like compare and get.
Differences between DynamoDB and GAE Datastore
One major difference between GAE Datastore and DynamoDB is that GAE supports single and multi property indexes while Dynamo does not support indexes at all aside from a table's primary key. GAE datastore supports efficient queries that use the indexes (if you try to run a query that does not use an index it will fail) along with some basic predicates like equality, inequality, greater than and less than expressions, etc. In DynamoDB, if you want an index, you have to build it yourself in a supplementary table.
GAE Datastore Self-Merge Joins
GAE datastore also supports what they call "self-merge joins" which are super powerful. I don't know if any other schema-less datastore has this.
DynamoDB Purpose
The main reason one would use DynamoDB is when they need scalable throughput; in other words, when your needs for write and/or read speeds fluctuate drastically and when you know you will occasionally spike to extremely high throughput requirements. For times when you expect to have huge throughput for writing, you can pay to scale for that small period of time and then you can reduce your costs by throttling down to a more sane limit. You can run MapReduce jobs over DynamoDB tables using Amazon Elastic Map Reduce. And you can also copy a DynamoDB table into an Amazon Redshift "warehouse"; once the data is copied into Redshift you can run efficient SQL queries over it and Redshift can efficiently do that over petabytes worth of data.
MongoDB
MongoDB, on the other hand, is a "schema-less," document oriented database that is good for organizing clumps of information as a single "item" in the datastore. So for example, you can have a single book document which contains nested information about its authors, keywords, reader reviews, and statistics about word usage in the book....all in a single mondodb "record." This is essentially impossible in DynamoDB (unless you do what the previous article's author did by storing an object graph as JSON within a single column but this is kind of misuse). Mongo also provides indexes on properties in those documents (even nested properties) similar to traditional RDMS indexes on table columns. It has very good support for clustering and it's very easy to setup. MongoDB is very fast therefore it is good for applications that intend to be very write-intensive. I believe this is one of the main reasons that people compare it to Dynamo. However, it is easier to scale Dynamo since you don't need to standup any additional servers yourself. MongoDB does not support transactions; operations on a single document are atomic but the database does not provide any native mechanisms for performing an atomic modification to two or more documents (you would basically have to implement 2 phase commit yourself). MongoDB has a powerful query language based on javascript/json but personally, I find it extremely painful to use compared to SQL.
For the TLDR-crowd:
- GAE datastore is a great mix of schema-less design, denormalized datasets + self-merge-join + some RDBMS functionality like indexes and SQL style queries with predicates
- DynamoDB is for systems that need the ability to scale reading/writing throughput to very high levels, on demand. It is pretty low-level in terms of features and datatypes and creating indexes are up to the user. Great integration with other AWS tools.
- MongoDB is great way for easily storing and retrieving object graphs (represented as JSON) with great read/write performance and some RDBMS functionality like indexes and queries with predicates
I am not familiar with CouchDB but I think it would belong in the MongoDB family.