(Fyi: this
link to the New York Times article bypasses any need to login/register with the nytimes.com website.)
I'm the Cleversafe Dispersed Storage software-development project leader. I work with Chris Gladwin (mentioned in the New York Times article) as a fellow manager at Cleversafe.
I offer some comments below to help outline some of the unique aspects of the Cleversafe technology.
Encryption is not dispersal. Cleversafe provides both, and then some. The Cleversafe Dispersed Storage software disperses any "datasource" (typically a file) into several slices (our current software current uses 11 slices in an 11-lose-any-5 scheme; future versions may use additional schemes with "wider" slice sets). Additionally, our software also encrypts, compresses, scrambles, and signs the datasource content, but we are not trying to reinvent the wheel: other software technologies exist to do these things, and we leverage them extensively.
We found that a bigger challenge than creating or managing dispersal algorithms was to make the
entire storage system regardless of the dispersal algorithm used (and we design the system to be dispersal-scheme agnostic). The meta-data management system and many other things took us far longer to implement then the
Cleversafe IDA. It's not hard to use Reed-Solomon, or some other algorithm on a single file or a small set of files and disperse the slices by hand onto several different system (or use variants of this like the 3-piece secret story with Amy, Bob, and Charlie mentioned above). It's much harder to manage this across an entire file system (with hundreds of thousand of files--or many more depending on the file system) for an unlimited number of file systems from all the various users across to be stored on heterogeneous set of an unlimited-number of geographically-dispersed, commodity-storage nodes in a completely-decentralized way with no dependence on the original source of the data (eg, you could sledgehammer your laptop and not lose any data that's stored on our grid/storage service). (I apologize for that run-on sentence.)
Further, dispersed-storage systems do not require replication. (Dispersal systems may replicate data for performance purposes, if at all, depending on the application/configuration/installation/context.) If a system replicates entire copies of the data (be they encrypted or not) then it, by (our) definition is not a dispersed-storage system. So a continual question I have when evaluate other systems: do they replicate the data in whole or not? Most systems replicate.
Cleversafe is not the first to present a dispersal system, but we like to think we are the first to make it broadly usable by people and inter-operable with other systems. See our
cmdline client (which will soon have continous-backup and XML-programmable policy management), our
Dispersed Storage API, our
dsgfs file system, a soon-to-be released GUI client, and future "connectors" (what we call the applications that leverage our technology) to come, all available at
http://www.cleversafe.org.
A side note: "revision management" is built into the Cleversafe system to address what I call "soft" failures (accidental deletes, application failures, etc) vs. "hard" failures (hard disk crashes) as well as archival requirements.
I believe that the concept of "dispersed storage" will eventually change how the world thinks about storage systems--regardless of whether or not these are Cleversafe-based systems (I think Cleversafe presents the best such system, but I of course am biased). I simply see too many benefits of the technology.
For what it's worth, here's a long-term perspective:
Our culture typically thinks of storage being resident in a place. I'd like to be able to one day think about storage (at least for "data at rest" storage) as a "persistent entity" that does not live in any one, particular place. I'd like to see a facility where a storage system/service is both ubiquitously available and extremely private (and I would argue, the *most* private). Users or administrators need not worry about making copies of the content for availability purposes (built-in content "revisioning" helps with the "soft" problems). As network bandwidth becomes less expensive and more available (particularly in a wireless fashion), I like to think that this vision could become a reality and that dispersed-storage mechanisms can enable this vision.