
NoSQL databases sometimes feature a concept called document storage, a way of storing data that differs in radical ways from the means available to traditional relational SQL databases. But what does “document storage” actually mean, and what are its implications for developers and other IT pros?
This article will focus primarily on MongoDB; the techniques utilized here are similar in other document-based databases. I’m assuming you’re already familiar with the basics of SQL databases and how each table has a fixed schema. That’s the first place where databases such as MongoDB are different: in Mongo, “tables” are referred to as “collections,” and records within a single collection can have different structures.
With Mongo, your records are stored as a binary form of JSON (called BSON). JSON stands for JavaScript Object Notation, and its syntax is basically the same as the object notation in JavaScript. For example, a single record might look like this:
{
“_id” : ObjectId(“4fccbf281168a6aa3c215443″),
“first_name” : “Thomas”,
“last_name” : “Jefferson”,
“address” : {
“street” : “1600 Pennsylvania Ave NW”,
“city” : “Washington”,
“state” : “DC”
}
}
This single record consists of a first name, a last name, and then an address that is itself an object with further data inside it. In MongoDB, each record is given a unique ID, a special type known as an ObjectId (although you can use other types for this id, such as strings). Mongo generates these unique IDs automatically, or you can create them along with the record.
These records are called documents. They’re not necessarily documents in the sense of a word processing document, although you can store binary data (such as a word processing document) in any of the fields in the document. You can also modify the structure of any document on the fly by adding and removing members from the document, either by reading the document into your program, modifying it and re-saving it, or by using various update commands.
This schema-less approach can be both a blessing and a curse. As a developer, I love that I can easily store complex structures in a single database record. If I were to take the example above and put it into a SQL table, I would either need to “flatten” it (by pulling the address, city, and state fields out of an inner object and making them part of the main object), or else put the inner objects in a separate table and include a foreign key to that other table. And from a programming perspective, these documents map beautifully to complex objects in my code.
But that can also cause problems. With a traditional SQL database, the database administrators and analysts can carefully design the schema for the table; once the schema is in place, programs can only add records that match that schema. That puts restraints on the programmers so they don’t accidentally (or intentionally) put unmatched data into a table. But with Mongo, the programmer can easily drop any type of data into any collection—raising the potential for accidents.
With the right tools, you can find a compromise. For example, the different language drivers for Mongo allow you to read documents into an object with a specific structure, and write documents from objects with a specific structure. In strongly-typed languages, this means you create an instance of a class, and then save that instance right to the collection. And if you do want to allow some leniency, you can use a Mongo-specific class in your code that works like a map, letting you add members on the fly. (The name of this class varies between languages.) Or you can create a strongly-typed class and include a member whose type is that Mongo-specific class; that member serves as a “catch-all” for data that doesn’t match the class’s schema.
(As for weakly-typed languages, such as JavaScript in Node.JS, it’s harder to force the programmer into a schema, but there are libraries that add class-like schema support.)
In the end, like so many tools, document storage in a NoSQL database can be easy to abuse; but when handled with care, it can become a powerful feature.
Image: Elnur/Shutterstock.com



Come on /. Your readers are technical not stupid. This is less than interesting. Find someone who actually knows about the subject then let them write a real article about the subject. A few paragraphs of high level description should be saved for marketing material.
- spam
- offensive
- disagree
- off topic
LikeSeriously, who wrote this? Captain Obvious? This stuff has been around for like 5 years. I was hoping for some benchmarks, case studies or application notes that would inform a decision I might make. This reads like a 4th grade book report.
- spam
- offensive
- disagree
- off topic
LikeRight, so, object store by any other name. Or what is it that you can usefully do with it beyond create/read/update/destroy? If you're going to compare to SQL, do explain what operations you still can do, and which you cannot. Or do "developers" really have that little understanding of just what it is that SQL offers beyond simple CRUD operations?
You don't do decomposition just to fit your data into the relations regime. You do that because having well-defined relations allows you to create new ones from them, linking up the data in new ways. This is massive complexity offload that this article writer apparently doesn't even understand. "NoSQL" in the form of mongodb unwittingly just got parked in a very special hopelessly bloated fancy key/value store with built-in de/serialisation category. Is that all there is to it? Well?
- spam
- offensive
- disagree
- off topic
Like"With the right tools, you can find a compromise." Tautologies 'R' Us.
Saanvik posts for me -- this is pathetic. Cogswell should be embarrassed to have this indexed under his name.
- spam
- offensive
- disagree
- off topic
LikeThis article isn't the biggest waste of my time in the last year, but it's close. NoSQL databases are a well known entity in the community that reads slashdot.
- spam
- offensive
- disagree
- off topic
LikeWhats the difference between using NoSQL and just storing a file on the filesystem. Both you can end up with a string that can be used to fetch the document. And with filesystem storage it's faster.
- spam
- offensive
- disagree
- off topic
Likeanonymous Because MongoDB lets you index and query documents. It's also significantly more space-efficient, and is designed to support many clients.
In general, a database is intended for large multi-user applications. If you're storing something directly on to the file system, you're most likely not writing an application that's going to handle thousands or millions of users.
- spam
- offensive
- disagree
- off topic
Likeanonymous for your example you would use the database's 'count' function (or similar), something along the lines of db.count( { city : 'washington', state : 'dc' } );
- spam
- offensive
- disagree
- off topic
LikeThanks for including that stock photo. I didn't realize what no sql is, but apparently it involves typing shit.
- spam
- offensive
- disagree
- off topic
LikeSo how does one use a NoSQL database? I assume you generally use a single document as a context and don't necessarily relate documents to other documents? Say, in this example, I wanted to get the total number of people who lived in Washington, DC. How does that get accomplished in NoSQL?
- spam
- offensive
- disagree
- off topic
Likeanonymous You retrieve every record and check the address. You can use map-reduce to split the records up, so it's not as slow as you might think. It's very web-scale and that's what matters.
- spam
- offensive
- disagree
- off topic
LikeYou've told me nothing besides what I could extrapolate from the first line of this page:
http://en.wikipedia.org/wiki/NoSQL
- spam
- offensive
- disagree
- off topic
Like