A Move to Secure Data by Scattering the Pieces 141
uler writes "The NY Times has an article about an interesting new open source storage project. Unlike data storage mechanisms today that work 'by making multiple copies of data,' the Cleversafe software takes an 'approach based on dispersing data in encrypted slices.' It's an elegant solution and one that's been a long time coming: the software uses algorithmic techniques known by mathematicians since the 70's. Adi Shamir (of RSA) first wrote of information dispersal is his 1979 paper 'How to Share a Secret (pdf).'"
Hmmm.... (Score:4, Insightful)
Re:Hmmm.... (Score:5, Informative)
Re: (Score:1, Insightful)
(wow, unintentional FP even...)
Doesn't FreeNet do this? (Score:2, Interesting)
I've been out of the freenet loop for a long time, but I thought I remembered reading in its documentation a few years ago that it did this same kind of encrypting and dispersing chunks of data.
Wasn't this Al Gore's idea? (Score:5, Insightful)
With all of this encryption technology, people still need to remember basic security tips. Use good passwords ("password" could be cracked very quickly even with 128 bit AES), maintain physical security (hardware keyloggers can find out about the manifesto you're writing before you even save the file) and use common sense.
Before you all ask, yes it does run Linux. The company was actually at Linuxworld.
Re: (Score:3, Insightful)
Cite? From what I've read about the original Arpanet, it was designed to allow the sharing of computer resources and data among DoD researchers. It wasn't designed to be a failure-tolerant network, although DARPA funded quite a bit of research in that area.
Re: (Score:1)
Since you demand a citation -- from the textbook Understanding Computers: Today & Tomorrow, 10th edition [google.com], by Charles S. Parker, page 365 (under Evolution of the Internet), emphasis mine:
Re: (Score:2)
You need a better textbook. The idea that the Arpanet was designed to be a survivable network is a particularly persistent myth.
Re: (Score:2)
To this day most military/government information is s
Crypto system for human rights watchers (Score:1)
The concept was that the watcher's laptop was likely to be inspected when they left the country. The inspectors wouldn't find anything s
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
That would be Rubberhose [wiretapped.net].
It scares me that Bruce said he didn't know about it. That means he doesn't want anyone to know. Please tell my kids to be good to their mom and that I love them.
Do not mod up (Score:2, Informative)
Re: (Score:2)
An important detail (Score:1)
Windows ME: Most Secure OS Ever? (Score:5, Funny)
Clearly Windows ME's memory -l-e-a-k-s- management made it the most secure OS ever. If only they had some way of reconstructing that data when you wanted it back again.
...like network RAID? (Score:3, Informative)
Don't understand the "IDA" trademark either... (Score:2)
Re: (Score:3, Funny)
"It's one louder."
Re: (Score:2)
Re: (Score:2)
site where the spelling in "incorrect" in the complementary ways which result in
conventually correct English.
Re: (Score:1)
Number 1 of 4 (Score:5, Funny)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Freenet? (Score:5, Interesting)
I was working on a p2p system that worked in a similar manner. I was even thinking of repurposing it for the sake of doing online backups - but frankly the bandwidth just doesn't seem to be there yet to do that sort of thing in a practical manner. That, and I got bored with the project... (but nevermind that).
Re:Freenet? (Score:5, Informative)
Shamir? No (Score:4, Informative)
Shamir's secret-sharing algorithm uses a similar idea (it's
essentially the same as Rabin's algorithm, except that the
data is padded with random gibberish).
Re: (Score:1)
Re: (Score:2)
For me a system that alows me and a buddy to backup each other's data without getting access to it would be ideal. Too much work to do myself, though, and apparently you weren't going to do it for me, so I got myself a colo machine instead and use it as an rsnapshot server as well as an openv
Re: (Score:2)
Funny, I've been thinking of doing the same. P2P encrypted backups.
I give you... All My Data [allmydata.com]. Its distant cousin (Mnet [mnetproject.org]) is still around, but sorta moribund.
Re: (Score:2)
Re: (Score:2)
There's a lot of issues besides just "use openssl." Granted - that's a GREAT way to get started but there are a lot of issues to take care of like secure buffer management and usage, protocols, key management to name a few.
Re: (Score:2)
That strategy has worked with me before, but sorry, not this time. I have too many jobs already.
X.
Re: (Score:2)
X.
Re: (Score:2)
Again...the only thing OpenSSL has in common with OpenBSD is the word Open. That is it. OpenSSL is not developed by the OpenBSD people at all. I wish people would stop saying this...
Re: (Score:2)
http://www.openssh.org/ [openssh.org]
That's what makes programming such a hassle
X.
aaaaaaaaaarrrrrrrgggggggggghhhhhhh! (Score:2, Funny)
Re: (Score:2, Informative)
Is it, though?
According to Lynne Truss's Eats, Shoots & Leaves:
Having said which, 1980s clearly makes more sense. It's a plural, not a
Re:aaaaaaaaaarrrrrrrgggggggggghhhhhhh! (Score:5, Interesting)
This gets messy, however, since the word 'years' is implied, and to say during the '70s' will make people wonder which 70 seconds you're talking about, and why it needs to be encapsulated with apostrophes -- is it an idiomatical 70 seconds? Kinda like the Biblical '40 days'?
For that matter, if you really want to get pedantic, what's the use of referencing the 70s at all if you're not going to bother denoting the scale? I mean, surely not mentioning that it's AD (or CE) is going to confuse people using other calendars... more so than misusing an apostrophe, right?
Along the same lines, it's just horrific that they'd abbreviate the decade anyway, how are we to know that the writer didn't intend the 1870s, or the 2070s even, if he happens to be living backwards in time?
Bah, there are grammatical rules, and it's great if everyone follows them, but really, it makes no difference if he spelled it 70's, '70s, or seventies (which is the proper spelling, btw).
Re: (Score:2)
Re: (Score:2)
6 of 11 (Score:5, Informative)
In a business example where you know that you can ultimately control the sites where you're storing your partial data, this would be a very good thing.
For the single user attempting to secure his information by using the existing network, there are some downfalls. 6 of 1l slices of the data are needed to recontstruct the whole. Therefore if a party intent on obtaining secret data obtains the majority of the servers, he has the data.
Also, if a disaster wipes out the majority of the servers, leaving five or less of the eleven, the data is gone.
This is a very, very important concept for business storage, but I have to wonder if it scratches any geek itches not already soothed by Truecrypt and Par2.
Re: (Score:1)
Number 2 of 4 (Score:5, Funny)
Of complexity, but also adds
Re: (Score:2)
a great way
As it applies to my sex life (Score:2)
from any k pieces, but even complete knowledge of k - 1 pieces reveals absolutely no information about D
I use this approach in my sex life, however, rather than obscuring information about D, even knowing one "piece" p reveals way more information than I'd like to have out there. Hell, ever since k-1 got a page on myspace, every potential n+1 knows about me before we even get started.
dispersion, section one (Score:2)
Distributed Pointers Too? (Score:2)
Number 3 of 4 (Score:5, Funny)
Re: (Score:2)
to get lots
Like mnet? (Score:5, Insightful)
From what I remember they split up data into multiple pieces, encrypted it and distributed it over a number of nodes, with some redundancy in it. If you know python and are intrested in p2p I'm sure there's a lot to be learned from that project.
Re: (Score:2)
The current incarnation of these ideas can be seen in the Allmydata [allmydata.com] service, which uses Tornado/Raptor codes (very
dispersion, section two (Score:2)
but. but, but... (Score:2, Funny)
You mean to tell me that all those hours of defragging my HD's on Windows 98 were actually a waste of time??
Pick up the Pieces (Score:2)
-Peter
Number 4 of 4 (Score:5, Funny)
an increased risk of loss of data.
Burma Shave.
Re: (Score:2)
of funny mods.
Re: (Score:2)
heh!
dispersion, section three (Score:2)
dispersion, section four (Score:3, Funny)
I thought of this a few years ago (Score:3, Interesting)
This system could be used for high profile secrets, like government whistle-blower data and the like. Storage would be secret and nearly undetectable because of all the other virus noise. Retrieval would be highly public by necessity, both to make retrieval possible and to publicize the contents of the data.
Re: (Score:2)
dispersion, section five (Score:3, Funny)
Transposition ciphers (Score:2)
This only works if the distance between the moved elements is greater than the attacker can cross. Not much different than sending reset passwds unencrypted through emails.
new implimentation of an old idea (Score:3, Informative)
I don't see whats so new (Score:2, Interesting)
First I would encrypt the original file, split it up into 10-100 pieces, encrypt those, hide them in other files, encrypt those, then store them in random locations around the internet either by emailing a piece to a webmail or uploading to a server somewhere, posting the binary or hex sequence to a forum, things like that.
Heck sometimes I'd repeat the repeat the encrypt/split/hide process several times, or even put the last step as hidden. Yes I realize anyone
Re: (Score:2)
You mean like the details of your encryption/splitting/hiding algorithm?
Re: (Score:1)
While, for one, the individual steps may not be perfectly secure they are certainly far more complex and involve several expert and natural language systems.
But besides that, I figure if you can find the pieces, put them together in the right order (several times) and decrypt them, then my hat's off to you and I deserve whatever I get for my arrogance in my security.
Re: (Score:2, Funny)
Re: (Score:1)
The problem... (Score:3, Interesting)
The problem with this idea is bandwidth and speed. You think your broadband is fast, but if you have to download the 27 gigabytes of photos, music and stuff, it won't be exactly fast on a 8 Mbps DSL, not to talk about 1 Mbps or less. You might wait a couple of hours, but you won't wait a couple of days.
Okay. So you tell me that amount of available bandwidth will increase? But so will the amount of data that needs to be backed up. And it will grow faster than the bandwidth. Think of homemade movies. You can already fill up your average drive in no-time. What do you then do, when you get a HD camera?
Although the idea isn't a new one, I think it is still neat. It might work for some stuff, but I don't see this becoming mainstream with technologies like Time Machine [apple.com] coming to the end-users.
Re: (Score:2)
Plus you have to upload it more than once (a LOT more than once if you want to be sure) to avoid emberassing "the important last piece of my backup was on the old 486 of a hobo that got thrown away" situations.
Learning from normal P2P, if you want to get it back after a year, there should be at least a 10-20 factor of redundancy.
Which leads to another point: those redundancy of course is incredibly wasteful. Just i
Re: (Score:2)
True. Uploading will be very slow and you would have to consider the fact that depending on the system you might need to upload the same data more than once. However, uploading backups would not be as a big priority to users as restoring them. It could happen all the time slowly in the background. Once all the data, say 80 GB, is uploaded you would only need to update the changes. Say you changed an average of 1
Sorry, old news (Score:1)
Oh come on, a paper?
Everyone knows that if you want to share a secret, you just tell it to a -- eh, never mind. :P
That's how CDs work - distributed data (Score:3, Informative)
Not quite, but the coding scheme that makes CDs and DVDs resistant to dust and scratches works much like that. Big blocks have an error correcting code appended, and then the bits of the data plus error correcting code are rearranged and spread widely across the block. So when you lose a contiguous set of bits, you can replace it by using data distributed across the block.
It's a good error correction scheme, but it's not exactly new. Every CD player in the world has this. CDs aren't encrypted (there's no key, just an well-known algorithm), but you could mix encryption in if you wanted. This wouldn't help the error recovery.
Ancient (Score:3, Informative)
Re: (Score:1)
PASIS is not based on IDA. PASIS is not a mechanism for data dispersal. It is a family of storage protocols that make efficient use of data dispersal mechanisms. Any mechanism that satisfies the m-of-n condition (where of n data fragments, m are necessary to reconstruct the orignal data item) can be used. This can be mirroring, striping, erasure coding, IDA, and anyth
Re: (Score:2)
A clever, efficient approach (Score:2)
The Judge (Score:2)
Sharing a secret in the offline world (Score:3, Interesting)
You take the secret and divide it into 3 pieces. You have a team of 3 people to each carry or memorize two of the 3 pieces.
Amy carries pieces 1 and 2
Bob carries pieces 2 and 3
Charlie carries pieces 3 and 1
If any one of them is compromised by bribery or other means, 1) the information is not lost and 2) the enemy has only an incomplete picture of what is going on.
This can be extended to more people to achieve greater redundancy or less exposure:
More redundancy: 4 people with 4 peices, each person knows 3 elements. Any 2 of 4 people needed to put the pieces together.
Less exposure: 4 people with 4 pieces, each knows 2 elements. Any 3 of 4 people needed to put the pieces together. Loss of 1 person exposes 1/2 of the total secret.
There's no reason to stop with 4 people and 4 pieces.
Think of this as RAID for human-knowledge.
Pfft security schmerity (Score:1)
~Teh Def1c4t05S~
I...welcome...encrypting (Score:1)
Notes from the Cleversafe lead developer (Score:5, Informative)
I'm the Cleversafe Dispersed Storage software-development project leader. I work with Chris Gladwin (mentioned in the New York Times article) as a fellow manager at Cleversafe.
I offer some comments below to help outline some of the unique aspects of the Cleversafe technology.
Encryption is not dispersal. Cleversafe provides both, and then some. The Cleversafe Dispersed Storage software disperses any "datasource" (typically a file) into several slices (our current software current uses 11 slices in an 11-lose-any-5 scheme; future versions may use additional schemes with "wider" slice sets). Additionally, our software also encrypts, compresses, scrambles, and signs the datasource content, but we are not trying to reinvent the wheel: other software technologies exist to do these things, and we leverage them extensively.
We found that a bigger challenge than creating or managing dispersal algorithms was to make the entire storage system [cleversafe.org] regardless of the dispersal algorithm used (and we design the system to be dispersal-scheme agnostic). The meta-data management system and many other things took us far longer to implement then the Cleversafe IDA [cleversafe.org]. It's not hard to use Reed-Solomon, or some other algorithm on a single file or a small set of files and disperse the slices by hand onto several different system (or use variants of this like the 3-piece secret story with Amy, Bob, and Charlie mentioned above). It's much harder to manage this across an entire file system (with hundreds of thousand of files--or many more depending on the file system) for an unlimited number of file systems from all the various users across to be stored on heterogeneous set of an unlimited-number of geographically-dispersed, commodity-storage nodes in a completely-decentralized way with no dependence on the original source of the data (eg, you could sledgehammer your laptop and not lose any data that's stored on our grid/storage service). (I apologize for that run-on sentence.)
Further, dispersed-storage systems do not require replication. (Dispersal systems may replicate data for performance purposes, if at all, depending on the application/configuration/installation/context.) If a system replicates entire copies of the data (be they encrypted or not) then it, by (our) definition is not a dispersed-storage system. So a continual question I have when evaluate other systems: do they replicate the data in whole or not? Most systems replicate.
Cleversafe is not the first to present a dispersal system, but we like to think we are the first to make it broadly usable by people and inter-operable with other systems. See our cmdline client [cleversafe.org] (which will soon have continous-backup and XML-programmable policy management), our Dispersed Storage API [cleversafe.org], our dsgfs file system [cleversafe.org], a soon-to-be released GUI client, and future "connectors" (what we call the applications that leverage our technology) to come, all available at http://www.cleversafe.org [cleversafe.org].
A side note: "revision management" is built into the Cleversafe system to address what I call "soft" failures (accidental deletes, application failures, etc) vs. "hard" failures (hard disk crashes) as well as archival requirements.
I believe that the concept of "dispersed storage" will eventually change how the world thinks about storage systems--regardless of whether or not these are Cleversafe-based systems (I think Cleversafe presents the best such system, but I of course am biased).
Shared secret (Score:2)
"Blondie, what did he tell you? I know which graveyard the money is buried in. Don't die on me Blondie. What did he tell you?"
"A name... a name on a gravestone..."
"Ah! We are partners! I know the graveyard, you know the name! Partners just like good old times, eh?!"
Use more than one pad? (Score:2)
Why not just get K random sequences and XOR them together to get a 1 time pad. Then encrypt the data and store it in public view. You will need ALL the pads to unlock it.
It's an RPG quest! (Score:2)
Elder: "We need the sacred information of Pr0n!"
Elder: "Unfortunately, the dastardly Cleversafe has scattered this information into 12 parts."
Elder: "You must go to each of the 12 ancient ruins and collect the sacred information for us!"
Player: "This quest sucks."
Makes sense now....
Potentially great for internal use... (Score:3, Interesting)
You invented Lotus Notes (Score:3, Informative)
Re: (Score:1)
Very informative.
Re: (Score:2)
Ignorance is no excuse (Score:2)
Now consider what happens when RIAA figures out that every linux user may store copywrighted tunes in their
(Put a million computers to cat
Homework: test how long it takes for your
Re: (Score:1)
You jest!
I'm actually going to go do this. I'll do Shakespeare quotes, music tunes, Futurama quotes. Any others?
Re: (Score:1)
With English prose, a good assumption is 2 bits per letter.
Re: (Score:3, Informative)
(emphasis mine) That legal principle only means that you can't murder someone, and then claim “...but I didn’t know that was against the law” and hope to be let off. In order to commit a crime, you have to have mens rea (to use yet more Latin)—that is, a “guilty mind”—so storing data that you genuinely didn't know was illegal isn’t a crime. And in this case, there a
Re: (Score:1)
GPP was also a joke, by the way.