Open Source Moving in on the Data Storage World 169
pararox writes "The data storage and backup world is one of stagnant technologies and cronyism. A neat little open source project, called Cleversafe, is trying to dispell of that notion. Using the information dispersal algorithm originally conceived of by Michael Rabin (of RSA fame), the software splits every file you backup into small slices, any majority of which can be used to perfectly recreate all the original data. The software is also very scalable, allowing you to run your own backup grid on a single desktop or across thousands of machines."
I don't think you know what that word means. . . . (Score:2, Interesting)
Backup for Backuper? (Score:3, Interesting)
If there is a creator/seeder, then we are still burdened by having to keep this seeder safe so that we can retrieve the distributed slices.
If there is no creator/seeder, is this safe enough so that people cannot patch slices together by way of trial-and-error?
Think RAID5, only way better (Score:5, Interesting)
Using the information dispersal algorithm originally conceived of by Michael Rabin (of RSA fame), the software splits every file you backup into small slices, any majority of which can be used to perfectly recreate all the original data.
It seems like this can be tuned to provide varying levels of fault tolerance. According to the abstract (I don't have an ACM web account, and I couldn't find the full text), it seems like I can take a file and make it so that any four chunks can be used to rebuild the file. I can then take those chunks and distribute them eight times to different machines. Thus, five of the eight machines would have to be rendered inoperable before I were unable to retrieve my data.
If I understand it correctly, then this is really slick.
Rar + Par + BitTorrent? (Score:5, Interesting)
Par files (for use with QuickPar, etc) are great, saving all sorts of extra posting on binary newsgroups.
You mean Shamir, not Rabin (Score:5, Interesting)
Even more amazingly Shamir's secret sharing scheme allows computing math functions, such as digital signatures, without ever recovering secret keys. This is called threshold cryptography, some of you may be interested to learn about its many wonders. Shamir rocks and so is threshold crypto!
innovation (Score:2, Interesting)
Maybe one day vendors will stop pushing overly expensive and utterly bland storage solutions. i.e. Last time I had a meeting about storage the product was: 2x Servers 2x Disk Arrays with possible storage of a little under 2TB (using 24 80Gb SCSI HDDs) with RAID 5, Oh and the storage was presented as 4 @500Gb drives to the OS (Some proprietary thing). all in at a cool £27.000, (and that was before the license for CIFS) guess how it was billed - innovative... Its a joke, so the solution? In the meantime lots of SATA Drives and file replication, eventually? maybe we can make use of all that storage that sits on every machine on the LAN that is never used...
Virtual file server -- was a program for old Macs (Score:5, Interesting)
By chance, anyone remember this technology? I have no idea what happened to it, but it would be a blockbuster open source app if done today, and was platform independant. If done right, one could create data brokerage houses, where people could buy and sell storage space, and also reliability, where space on a RAID or server array would be of higher value than space on a laptop that is rarely on the Internet.
Re:Editors, please note! (Score:3, Interesting)
Re:Think RAID5, only way better (Score:4, Interesting)
Rabin has shown how to come up with l vectors of which k are mutually orthogonal.
Sounds familiar. Like my master's thesis. (Score:5, Interesting)
In fact, I wrote a RSRaid driver for Linux for my thesis and did some performance testing on it. I'll save you the 30 pages and just tell you that the algorithm is far too CPU intensive to scale up very well for fileserver use (my original intent,) but I did conclude it could be used as a backup alternative to tape. Hmmmm.
Direct Link [dyndns.org]
Google Cache [72.14.203.104]
Please forgive the double brackets, I fought witH Word and lost.
Contact me if you'd like to play with the code. I never did any reconstruction code, but the system did work in a degraded state, and was written for the Linux 2.6 kernel.
Byzantine for Beginners (Score:3, Interesting)
Publius (Score:3, Interesting)
It's nice to see another attempt that's free. Free speech requires anonymity.
RAID 5 at the File Level (Score:3, Interesting)
From the summary : "the software splits every file you backup into small slices, any majority of which can be used to perfectly recreate all the original data."
So, basically it is like RAID 5 striping and parity [wikipedia.org] applied to the file level.
Neat concept.
Re:"any majority of which" (Score:2, Interesting)
Re:I think this is wrong again (Score:3, Interesting)
It is a classic example of a bad patent. There was prior art (though admittedly this was kept top secret till 1997) and it also failed the obviousness test. Clearly if someone else came up with the same algorithm four years earlier it was clearly obvious to someone skilled in the art of cryptography. In fact Cocks invented the algorithm "over night" after being told about James H. Ellis (another GCHQ worker) concept of none secret encryption, which occurred to Ellis after reading a paper from World War II by someone at Bell Labs describing a way to protect voice communications by the receiver adding (and then later subtracting) random noise.