A Different Idea For Distributed Storage 106
hojo writes: "A really cool idea for an anonymous, distributed storage system is actively being worked on. Talk about a way around censorship and control--check out this article at Forbes for more." The article talks about a system dubbed "OceanStore," a high-concept application of the same massively distributed and replicated data idea behind FreeNet and some other projects. The availability of massive storage cheaper and cheaper will start to change exactly what we think is worth saving and where it makes sense to store it. (Do we want a data cloud full of the digital pictures millions of people couldn't bring themselves to delete?)
military (Score:1)
eudas
RAID-2001? (Score:1)
while the ideas behind it are great, prevent censorship and general control by authorities of free speach, it does beg the question, where does it end? and how many people are really going to be willing to dedicate equipment and bandwidth to this sort of thing?
personaly I don't know what people are thinking when they say that storage is so friggin cheap... it's certainly not for the average joe, I'm still stuck on a 20GB 5400RPM ATA/66 drive when I could make good use of a 60GB 7200RPM drive, I'm certainly not going to dedicate any significant portion of my precious drive space to store bits and peices of files belonging to other people, 90% of which probably don't need to be on a network of this sort anyway.
The pros and the cons (Score:4)
Ummmm.... (Score:1)
A better link (Score:5)
Re:RAID-2001? (Score:2)
And there's no reason to believe that price drop won't be repeated over the next 12 years.
Also, you don't seem to get the point:
First of all, the idea was that ISPs could offer you access to a "virtual drive" distributed over multiple physical locations, with multiple copies of the files, for security. Thus, instead of buying that 60GB drive, you could buy 60GB of redundant distributed storage, that you would be able to access from anywhere.
Alternatively, you could band together with some friends: You give up say 5GB of your drive, and get the same in return distributed over your friends systems, to use for backing up important files.
It's not about giving up your storage for nothing. It's either to provide it as a service (and get paid), or to give it up as an "exchange" - you give up some space to get access to the network, either to store your own files, or to get access to files other people have stored there.
Re:The pros and the cons (Score:1)
You don't. This is a long-term project; it's aimed at a time when everyone has an always-online PDA.
what happens when some secret service first succeeds in quantum computers...
You're fucked. Let's face it; once quantum computing comes online, all cryptography is defunct. Distributed storage is only one tiny area that's got to worry.
interesting but... (Score:1)
2. How do you index this thing? Centralised or distributed? Who controls it if central?
3. How do you clean up old stuff no one wants? Once your file(s) are copied numerous times, you are going to have extra overhead everywhere. Or can you send a command to all computers connected to delete said file?
There are so many questions that need to be addressed I don't know where to start.
data storing and new problems arising ? (Score:2)
> pictures millions of people couldn't bring
> themselves to delete?
O course we do !
Why ?
Because I prefer keeping trace of everything than just forbiding more and more things to people (e.g. hard disc encryption, CSS, etc.).
I like the idea of a central server to which are connected a bunch of network computers. The difference with the idea that was once made popular by Larry Ellison ?
It is that the central supercomputer dealt with will be distributed and thus virtual.
It will become something like a reticular subconscience in which people will have to dig very hard (because it'll have to be quite secured so that each user's privacy is respected).
And, as we are also dealing with A.I., it might become possible that some unknown problems appear that will require some e-psychanalysis.
--
Re:interesting but... (Score:2)
I think the real problem is insuring its reliability. Look at the current "big" distributed programs running now... SETI, Distributed.net, Processtree (the free ones), et al.. There are some very dedicated users who do it for no reason other than to help a cause; you will find this in nearly every project. However, this does not make for a *very* stable and reliable "backbone", as it were.
2. How do you index this thing? Centralised or distributed? Who controls it if central?
Distriubted with a series of "central" directory servers would probably be the best bet. That is not really a great problem - all it needs is a few hours (days) of good thinking, and a little testing and a good system would be worked out quickly.
3. How do you clean up old stuff no one wants? Once your file(s) are copied numerous times, you are going to have extra overhead everywhere. Or can you send a command to all computers connected to delete said file?
Files that no one wants? I don't think they exist. People (read: some people) will generally find uses for the most obsolete things (I.e. look at all the interest in old (read: vintage) games that companies won't release to the public. There is demand, just not huge. That does not nullify the fact that the demand exists. Also, several locations of the same file will not be a bad thing, per se, like Napster, it will make it easier to get it more quickly, and without worry about downloading it from some server in the middle of Russia. The more the merrier.
Cheers,
Fran
what about the privacy issues? (Score:2)
The usual argument is, "we got strong encryption, so everything's okay". However, this ignores, that when data is being stored in far flung places, the potential for interception by both domestic and international friendly and hostile entities is possible, some of which have the ability to break strong encryption. Think of the industrial espionage possibilities, or even just invasion of personal privacy.
Quantum Computing does not break all crypto. (Score:2)
Simply false. With what's currently known, public-key algorithms might be in trouble, but the secret-key stuff works just fine if we double the key lengths. 256-bit AES should be fine.
And even for public key stuff, "defunct" is massively overstating it.
--
0-day warez!@ (Score:1)
Few differences from Freenet (Score:2)
But I think Freenet can pretty easily be used as a basis for a system like OceanStore. A subnetwork of Freenet servers could agree to request your document for you periodically, ensuring that it stays available in their caches.
And, of course, this doesn't have the privacy and free speech elements that Freenet has. Unfortunately, I think those will make it hard for Freenet to truly prosper. ISP's will be afraid to run Freenet nodes.
For ages, I've wanted to be able to request data based on what it is, rather than where it is. For example, I wanted to write an installer that would ask for a dll by key, and the system would figure out where best to get it from. If the product CDROM is in the drive, get it from there. If you installed the product on another machine on the lan earlier, get it from a local cache. If another user of your ISP installed it recently, get it from the ISP's cache. Otherwise, find it on the Internet, either on my company's site or a closer mirror (maybe one that's been created automatically based on download patterns).
If Freenet doesn't grow to provide this, maybe OceanStore will. I'll be happy if either one does.
--
Re:Quantum Computing does not break all crypto. (Score:1)
Okay, I was a little over-zealous; my apologies for over-stating. However, I'm not willing to make a full retraction; full quantum computing is still decades away, and with the advances being made (be it distributed/brute force, whatever), by the time quantum computers are here keys will have at least quadrupled in length, and still be insecure. Probably.
Don't use it then (Score:1)
Ever tried to ski on the backs of the opressed? (Score:2)
Re:Russia (Score:1)
Killing criminals does not bring their victims back nor does it give their relatives any relief or consolation. Taking human life is always wrong.
You say that you'd start with criminals, but who would be next? People with unpopular political opinions (like me), artists whose works you might find offensive (like me) or your personal enemies?
Where would it stop?
Some problems with distributed filesystems.. (Score:2)
Making fully distributed untrusted systems is hard. Making hybrid distributed systems is harder.
Reading section 4.4.3 makes this apparent - they dont have a fully distributed system. If you want the most hardcore possible file-system semantics, a small number of servers with huge bandwidth vote on what to do. So its sort of centralized some of the time. later in 4.4.3 they say if you dont need such strong semantics you dont have to use these tier 1 machines to do the update ordering.
That said, We(tm) need a good anonymous ubiquotous data store. Incidentallly, for all you
Maybe I missed the idea... (Score:1)
Re:Quantum Computing does not break all crypto. (Score:1)
Re:Russia (Score:1)
Boy-girl bands.
Re:interesting but... (Score:2)
2. How do you index this thing? Centralised or distributed? Who controls it if central?
Distriubted with a series of "central" directory servers would probably be the best bet. That is not really a great problem - all it needs is a few hours (days) of good thinking, and a little testing and a good system would be worked out quickly.
Whoa there. Have you tried to work through this before ? :)
People have been trying to do this for _Years_. I spent more than a few days doing a thesis on just this problem - distributed algorithms are not trivial. Especially when they need to be reliable and arbitrarily scalable. This system seems to have good reliability hooks, but as you eluded to, they chose a compromise for decentralization... their update policy has fuzzy semantics and requires centralized control for strict atomicity for single-client updates. They say "it should scale to a few million servers". That would be pretty cool, but they might be wrong, and that means it will wear out before its even fully realized :)
They have a pretty kick ass solution to the "delete" bit though. In short, they dont seem to do it. Also, they keep around old _versions_ of files. For once, you can actually get persistant storage. Updates to a file increment its version number, but they seem to indicate that old versions will stick around and can even be requested, (this is not stated: Similar to CVS/RCS). So you can conceivably make something analagous to
http://site.made.in.1987.com/index.html;version=1. 1
Other filesystems (VMS did this ?) had versioning right in the file-system. I think its a good idea. FS support for solutions to things like DLL hell ? Count me in!
Re:RAID-2001? (Score:1)
This, and many other previous projects (and even a few products) are attempts at that holy grail: the file-system for loosely coupled clusters that gives people useful and familiar semantics, and scales well.
Personally i know i'd love to just be able to add capacity to my entire enterprise by plugging in a few more boxes with 30 gig ide disks in them, and have everything just figure itself out. Bam, i get more cpu power, more disk space, and more app serving capacity. Oh, and better fault tolerance.
Oh, and it all manages itself without ridiculous amounts of sysadmin intervention.
Thats what these groups are going for. RAID doesn't come close. Not even the same topic.
My favorite quote (Score:2)
Sure! That is why EMC and IBM are major players in the project.
Re:Russia (Score:1)
Think about it.
Re:Russia (Score:1)
Maybe, but that's what makes us better people.
forget distributed (Score:1)
Everyone who wants in chips in some money and gets a server with many terabytes of data...and puts it in the middle of the ocean
Why? Because you're in international waters, and pretty much can't be charged with shit. Do any illegal thing you want. Store MP3s, warez, MS Kerbos, whatever.
Re:My favorite quote (Score:5)
You should know that if you've heard of EMC. EMCs are practically self contained black-boxes of "poop data here and dont worry about it, ever". Some (all?) EMC systems phone home when they think there will be a problem. It is not uncommon for the first sign of disk failure in an emc to be the new disk arriving in the mail on the sysadmins desk!
Its not tricky to slap a bunch of drives together and get an assload of capacity. It is tricky to figure out how to keep 23523 18gb disks running if you just have an excel spreadsheet telling you which cabinet each disk sits in.
Ditto with IBM. The coolest thing i've ever seen are the multiple-arm tape storage libraries with the ADSM interfaces infront of them that make data archival and retreival pretty painless.
The key is managing data distribution when you've got an assload of data. This is one project that addresses that, among other things.
Re:Russia (Score:1)
If you disagree with me, please do not resort to childish ad hominem attacks but post some rational arguments and we can discuss this. Maybe you can convince me that my arguments are not valid.
Re:Russia (Score:1)
As far as the guns go, they will not do much good to you when you're facing a trained army -- the killing machine that is the power behind the centralised government.
Re:Russia (Score:1)
An Anonymous Coward who likes to insult people. Grow up.
Wacky new idea for these schemes... (Score:1)
Not only would you not know what machine a file was on, but you also wouldn't even know what network it was on, or what protocol was managing it!
FatPhil
-- Real Men Don't Use Porn. -- Morality In Media Billboards
Where's the story? (Score:2)
How about Synchronisation? (Score:1)
So that'll mean that our handheld device will have to synchronise with a server that will distribute it to many other servers. What if one of those servers was unavailable at the moment of the synchronisation? Also wouldn't it take time to send data back and forth between all those servers and how can we be sure that no one will be able to crack this encryption method?
File Security options (Score:1)
Everything old will be new again... (Score:1)
So isn't this just the digital equivialant of having a box with all of your old negatives in it? :) You have so many pictures you don't know what to do with them and no really good way of organizing them.
Speaking of which, my wife is a Librarian and she was amazed at how few of digital photograps are really kept. Things like your state's (if you're in the US anyway) historical society have tons and tons of negatives that if the person who took them had the ability to instantly delete them, they probably would have. You never know how important something would be 10, 20 or even 100 years down the road. So perhaps having limitless storage isn't a bad thing from this perspective.
Re:military (Score:1)
Re:Few differences from Freenet (Score:1)
Well said. While I have no experience using Freenet, I picture it to be a repository of all sorts of positively disgusting porn with children, animals, shit and maybe a few copies of the Communist Manifesto or ASCII art of Che Gueverra. But OceanStore looks like it could possibly get more mainstream and possibly corporate support, as it focuses more on long-term technological adcantages.
One of the things that I find so interesting about massive distributed storage is that it makes the notion of a global consciousness/memory more concrete. One can look at the internet in the macrocosm as a human brain, and its contents are things that it is thinking about. When individuals have PDAs and unlimited storage anywhere they go, the internet becomes a more accurate representation of what is going on in the minds of people everywhere. I heard (maybe several years ago, and no source) that the majority of traffic on the internet is pr0n -- a sort of global puberty?
Re:forget distributed (Score:1)
International court systems won't come to your aid because first they would charge you with infringing on international copyright issues or other similar laws before dealing with your problem.
Good luck!
Re:Ummmm.... (Score:1)
At least when they do knock down a wall to lift you into the outside world with a crane, they will all hear, through the laboured breath, "woo-hoo, I'm the master". Make a nice epitaph as well. "Anonymous Cowerd c1985-2001 He got first post".
Re:what about the privacy issues? (Score:1)
*any* strong encryption scheme has been broken
by someone? Or are you just talking out of your
ass?
Re:File Security options (Score:1)
every version (Score:1)
Jeez, does no one ever learn? (Score:4)
The problem with a system like this is that it is designed for adults with adults (and the self-restraint that maturity brings) in mind. I'd reckon its going to be next to impossible to regulate when the kids find out they have almost unlimited storage capacity, for a week or so until the system collapses under the weight of the kids' vast warez collections. If you think they are going to assemble their collections efficiently, then you need treatment. How likely is it that the kids will:
Search the public areas archive extensively to determine which parts of their collection are already stored
Identify the set of files in the current collection of thousands that are not already in the store
Segregate their collection and upload all these missing files
Create an index of their archive for distribution?
Unlikely. They are all just going to upload their entire collections en masse. Cognitive simplicity is a powerful decision maker.
Everybody is coming up with neat solutions for this and that, and saying how great it would be if we all had cryptography and online secure storage and stuff. How come no one ever thinks: what are the bastards likely to get up to with this neat new stuff, and how can I prevent them from doing this in the first place.
The world (and the net particularly) is not full of decent, unselfish, philanthropic people. It is full of slash-and-burn arseholes who will happily spoil everything for everybody (themselves included) as long as their short-term desires are met.
As I see it the difference between me and some ivory-tower do-gooder is that they have faith in humanity: they'll be diligent, noble, unselfish and charitable. I have faith in people: faith that they'll be lazy, screw up, not give a shit about the next guy, and doing this while complaining about how they are being shafted and that they are the victim in all this really.
You know I'm right...
Gary
Re:Jeez, does no one ever learn? (Score:1)
Isn't America great!
--
Re:The pros and the cons (Score:1)
As we move further into the internet-world(tm) of always on and mobile devices, this is anologous to saying "If my hard disk fails, how do I work on my data?"
Re:Jeez, does no one ever learn? (Score:2)
Actually, you would assume that to partake in this system, you would have to either contribute drive space, or pay for access. Therefore, your "ocean" space is limited by your contribution.
You know I'm right...
Re:File Security options (Score:1)
Re:Quantum Computing does not break all crypto. (Score:2)
--
You need a better source for such speculations. (Score:2)
If you've any basis for that belief at all, I'd love to hear it...
--
Re:File Security options (Score:1)
Hasn't anyone figured out that all mathematical based crypto is being cracked at exponentially faster rates? So this all boils down to security by obscurity, which is BOGUS! Even the "split the files so only the owner can figure out where they are " doesn't make it, how hard will it be to write drunken neural-net spiders that stagger around and fit stuff together?
Feh!
Secure? How? (Score:1)
My second point is that this is supposed to be available in 10 years. According to Moore's Law (which we all know and love) the computing power (presumably meaning what any consumer has access to) doubles every 18 months. Well 10 * 12 = 120 / 18 = 6.67. Todays technology allows 1.2 Ghz computers for the masses. Multiply 1.2 Ghz * 2^6.67 ~= 122 Ghz computers. Lets be conservative and say that the 1 Mb Ram available on Intel processors is equally as ambitious; meaning we have 104 MB L2 Cache on board. What levels of encryption are we talking about here. Given the distributed.net statistics, less people are cracking rc5 / des / csc / ogr blocks daily, but the keyrate keeps going up. Why is this? Because the microcode in the newer processors enables the small bitwise rotations and xor's and other little commands that are quite confusing to non-geeks enable the basic functions that crack encryption to be greatly accellerated through caching of code in L2 Cache and processor speed. Now I'm not saying that we'll still be using this technology as its cludgy at higher speeds, but we'll have something roughly equivalent. (As an aside - I just thought of the possibility of playing Quake XII or whatever! Imagine the frags at that speed!!!) The point is the average user will have access to the power of todays supercomputers. For the encryption to be strong, even with something like PGP keys, a brute force attack isn't entirely out of the question. Granted that encryption will most likely increase to some insane level like 2^16304 or whatever, but still given enough time, eventually it will fall.
Just my buck-o-five.
Secret windows code
no (Score:1)
Re:The pros and the cons (Score:3)
Distributed operation and disconnected operation are IMO separable problems. There is a certain appeal in the idea of using the same approach to handle both, and some decent systems - e.g. Coda - have been based on that idea, but I believe it's a mistake.
My answer to your question is that you use some separate mechanism such as Palm conduits or the Windows Briefcase to handle the disconnected-operation part. Whenever and wherever you happen to be connected, you'll get to sync vs. the closest replica of your data in the distributed data store.
Re:what about the privacy issues? (Score:2)
I don't think the problem is being ignored. I have in fact discussed security with the OceanStore folks face to face, and I can assure you that they understand the problems. One aspect of the problem is that there's no perfect solution: you either want your data everywhere you go, or you want it to be totally secure in one location. No matter how many levels of attack and countermeasure you go through, you keep coming back to that. OceanStore will do the best they can to keep data secure, and they have a formidable bag of tricks at their disposal (e.g. the partial-key stuff), but at some level of paranoia no such data-distribution facility could be considered a good fit for secrecy needs. Those people can and should use something else, which does not reflect at all on OceanStore or the value it provides.
Re:Jeez, does no one ever learn? (Score:1)
Re:every version (Score:2)
Yes, it does.
That may be your long-term goal, but it's not an explicit goal of OceanStore. As it turns out, though, it might happen anyway. One of OceanStore's central goals is to make data pretty much indestructible, and if there were a version-retirement system it could potentially be subverted to destroy data. Last I heard, this was still an unresolved issue.
There is a project named Elephant - they never forget - that does have permanent maintenance of all versions as an explicit goal. I don't have a URL handy, but it shouldn't be hard to find via the standard search methods.
Re:interesting but... (Score:2)
That's easy. Let them make a profit from doing so. Let's say that I have a bunch of data that I need to distribute to a hundred sites. Doing that via standard means is pretty inefficient, sucking up a lot of costly bandwidth. Doing it via something like OceanStore might be much more efficient, and that fact creates a potential market niche. One of the most interesting ideas in OceanStore is non-technical; it's the idea that it allows "data access providers" (my term, not theirs) to offer a new service that is more efficient than what it replaces. If I want to distribute big mounds of data between a hundred sites, I might well do better to engage the services of such a data access provider using OceanStore than paying a mere bandwidth provider for all that unnecessary traffic.
Re:interesting but... (Score:2)
I can't speak for the person to whom you were responding, but I have.
Not trivial, but also not impossible. Designing chips and writing OS kernels are not trivial either, but people do those and even expect to make a profit from it. OceanStore is a research project. They're supposed to break new ground, and if anyone is qualified to make the attempt it's those guys.
Re:interesting but... (Score:2)
I believe Whistler does something like this, without FS versioning. As someone who has worked on several filesystems, including distributed and cluster filesystems, I can say in all candor that filesystems do enough of other people's work for them already. I think versioning is a wonderful feature to have, but it can be done perfectly well outside the filesystem.
Correct (Score:2)
--
Re:You need a better source for such speculations. (Score:2)
It's more of an impression than an opinion, one formed after reading lots of stuff on the internet (my sole source, sadly). I make no claims to expertivity (hence the final qualification); in fact, having scanned your page I'm willing to bow to your expertise on the subject. Whether the sun shines on my sources I'm not willing to speculate. Just so you can sneer properly, I enclose some of the links from my bookmarks that have been visited on a number of occasions:
The Cryptography Project [georgetown.edu]
Quantum Computing FAQ [rdrop.com]
Quantum computing [pnas.org]
There are more sites, but these are a fair representation. Were my conclusions wrong? Possibly. Was I reading the wrong sites? Maybe. Was looking on the web in the first place a wast of time? Dunno. But if I've helped you feel superior, then I can go home happy.
Re:Russia (Score:1)
--
Do we want useless items stored? Will SPAM stop? (Score:1)
It's a quaint thought, but the answer to this cynical question is YES, people do want that. Mass storage such as the type described in the article will provide one more avenue for Net Clogging. SPAM, virus alerts, tag-you're-it, love emails, Microsoft Outlook viruses, sob stories.... they may all be email, but when people can anonymously store large files such as graphics and audio/video, the people in the world that want to clog up the Internet and make it unusable will have one more avenue through which to do so - distributed, secure storage.
These people will upload tons of files that are junk to a lot of people - that is, many people will upload tons of junk; some people will have good intentions and others will have malicious intentions.
Freedom has a price - on the Internet that price is giving equal treatment to all data transmissions, whether they be from people with good intentions, or people who just want to disrupt the Internet as much as possible.
Deus Ex (Score:1)
"A blip [of Echelon III] runs on every electronic device on earth" - Morgan Everett, Illuminati leader, an approximate quote from Deus Ex
Also note that in the game, the ousted LEADER of the Illuminati was Lucius Debeers.
========================
63,000 bugs in the code, 63,000 bugs,
ya get 1 whacked with a service pack,
Thanks for the references! (Score:2)
I apologise for being ruder than I should have been - it was meant to be funnier and less harsh, too much caffeine. But I *do* wish people wouldn't post opinions on the difficulty of cryptanalytic problems that are based on no good evidence.
--
Ubiquitous storage reduces storage requirements? (Score:1)
Fundamentally, most of the contents of my hard drive already come from somewhere else---programs, data files, cached web pages, etc. Much of that information is pretty rarely used, and ends up being discarded (the web cache is really useful for a few pages, and a waste of space for the rest) or languishing (little-used but still essential software).
A "good enough" distributed store, possibly combined with good versioning and clever uses of caching/disconnection would make it practical for me to offload most of this useless garbage. Yes, I am instead accepting encrypted chunks of data from all over. But I bet this data is comparable in size to the savings realized by commoning up all the crud that is found on most hard drives.
distributed storage a plus (Score:1)
it is interesting to see that ibm is embracing this project, given that they like to amass server and storage power on large mainframes, single points of failure.
the whole concept of distributed storage, i find, is a more "enlightened" one but it is also one that is harder to implement. i believe that it is theoretically impossible to ensure consistency of data in a distributed storage system.
in such a system, the following criteria have to be satisfied for successful operation:
these conditions are difficult to satisfy, to say the least. also as the system grows in size, these conditions become more difficult to satisfy. so i am interested to see how these problems are addressed in this project..
--
mike's code [cwru.edu]
Unlimited, fast, and free distributed storage! (Score:1)
Network storage (Score:1)
This is interesting. The lessons from Napster, general network / lan files sharing, as well as dozens of other files sharing technologies, even ftp mirrors show us that this is a powerful and useful sharing proposal.
However, it isn't the only thing. There is still a place for data that is not shared that has higher fences or protection and authentication gaurding it.
I think about my data as being more like my money. I want to be able to retain full control of who gets it and when.
For instance, I want everyone to be able to access my boring personal web site all the time for free. I want only my wife and myself to be able to access my tax returns for the last five years. I want my child to be able to access the family photo gallery, but I don't want him to be able to delete it. I want to be able to transfer all my personal data from one data warehouse to another as easy as it is to transfer banks.
When people start talking about these file sharing technologies they forget that the data we have fits into many different profiles. Each of them needs a different level of protection.
Now one person I know suggested that people will always want their personal data on their own hard drive. Sounds like a good idea in theory, but here are the facts as I see them.
Now, as recently as one year ago, I would have said that most people had less then one meg of important digital content, that was unique. Now it's a different story with digital photos (good or bad, it doesn't matter) digital movies, banking and tax information, etc..
My two cents.
timbu
Re:The pros and the cons (Score:2)
quantum computers or whatever device capable of decrypting anything our current cryptographic technology can produce?
You most likely right, that whatever crypto that we have no will be able to be cracked in the no to distant future. however currently the only thing we know that a quantum computer can crack is very large prime numbers as found in RSA (used in SSH,PGP etc..)keys. this is due to a hack on an old prime number algrythim that some clever fellow applied to a theortical quantum comuter.
Further a quantum computer, if they ever do get built, uses QBits. These bits as far as i understand double each time they have an instruction executed on them.. (i'm sure this could be described better). One problem however is that you can never check you own variables, thus changing the quantum state of the machine. kinda of a bitch to code in i imagine.
On the note that everything can be cracked in the future, a guy i used to work who also claimed to work in the NSA said that the RSA can crack anything we have today within a few hours. at least anything WE know about, if you really found some good crypto, or something they couldn't crack you would be visited by some black helli's and suits before you know it, at least that's what he said, though i didn't belive much he said. Like this bit how when he was working at digital and was on the team that made VMS, etc.. etc.. you know, loud mouths.
-Jon
Why wait ten years? (Score:2)
Mojo Nation is a working implementation of almost all of the concepts described in the OceanStore paper. Mojo Nation breaks up data into pieces and then uses Rabin's Information Dispersal Algorithm to create eight redundant shares of those pieces, only half of the eight shares being necessary to recreate the original piece. Blocks are identified by their SHA1 hashes and documents are identified by the eight hashes of the pieces that make up the "hash tree." (a.k.a. the Dinode). Without the Dinode you cannot put the file back together since each piece is also encrypted. Even if your block server holds every single block in the file you still can't tell what it is.
Block servers only handle a small portion of the hash space and your broker keeps extensive local performance statistics on each other broker in order to make intelligent decisions on whom to ask first for a block (like keeping a map of who is logically closest in the network). There is much, much more.
Mojo Nation works today. While it doesn't have a large enough population of block servers to handle truly massive files yet, it handles a CD's worth of MP3's for instance. Here's a Mojo URL for an hour long set of really great freely distributable music [mojonation.net] by Medeski, Martin, and Wood [mmw.net]. (it's Jazz). It's about 80 megs total but when you ask your broker to fetch the link you'll get an HTML page describing the performance with individual links to each song (i.e. clicking the link won't download 80 megs of data unless you fetch it recursively).
If OceanStore sounds interesting, you should check out Mojo Nation [mojonation.net] It actually works. The interface is rough but that's because it's a simple web interface (easy and works on every platform). The core stuff is what the developers have been working on.
Burris
Re:Network storage (Score:1)
Not everything has to be usable for the stupid people of the world
The market is the answer... (Score:2)
Burris
Re:Where's the story? (Score:2)
Burris
yes (Score:3)
Burris
Re:RAID-2001? (Score:1)
I'm still stuck on a 20GB 5400RPM ATA/66 drive when I could make good use of a 60GB 7200RPM drive, I'm certainly not going to dedicate any significant portion of my precious drive space to store bits and peices of files belonging to other people, 90% of which probably don't need to be on a network of this sort anyway.
Get at least 80 gigs and it won't be an issue. I built a new computer not all that long ago, and put 2 40 Gig ATA100 Western Digital drives in it for ~$300. I've been filling it with recordings of various TV shows (about 100megs per hour episode) and have barely even made a dent in the capacity of the drives. In fact, I'm really not even using the second drive yet. I most certainly wouldn't mind this type of thing as long as I can control how much data it is able to store on my drive. Just my 2 bytes worth.
Sign me up (Score:2)
Darpa, fittingly, contributed to the $500,000 in seed funding for the OceanStore project, which now amounts to just a few computers, a few grad students and a couple of published academic papers.
Three VA Linux 1220 ($8,000 each)
Grad students ($50,000/year each - probably less)
Research papers (hire Jon Katz to write 'em $5,000) Grand total: ~ $174,000 leaving $326,000 for my Swiss account. Sweet
First thing a script kiddie will do... (Score:2)
1. Create large file of random numbers.
2. Create random file name.
3. Upload into distributed storage space.
4. Repeat from number one.
Why? First, simply to see what the system will do, and how long it will go. Second, they think it gives them reason to call themselves "31337".
If they don't have enough net bandwidth to do this, then they will wrapper it in virus code and drop it out as a bunch of emails. Letting other folks do this will leave their connection fast enough for T4C.
Re:Quantum Computing does not break all crypto. (Score:1)
The only other solution is, as you are implying, not to use a deniable system, but from what i`ve read, that could be worse. Ie, if its you getting killed, or you (possibly) being spared if you reveal info about more people, then i`m sure that the group consensus would be to use a deniable system and sacrifice any people who were caught.
I have to admit, though, that things arent generally that bad in the U.K./USA/Europe etc, and satisfying a judge that you had revealed all the data would be less of a strain.
Re:Network storage (Score:1)
That is pretty lame, but sadly I hear it way too often in this forum.
If that is so true (that people who can't do some esoteric computer task, then too bad for them) why do you use a keyboard instead of flipping bit switches on the front of an Altair. Well, duh the keyboard is easier and much more accessible for people who don't think in hex or binary.
Why don't you go back to your punch cards? That would certainly raise the bar for people who shouldn't be allowed to use computers.
Maybe everyone should have to write their own OS kernel, that would seperate who should be able to use a computer.
Maybe you should be forced to design your own chips, that would really seperate those who are worthy to operate a computer.
You do whatever you want with your CDR's. Keep all you money under your mattress. I don't care.
What I'm saying is that there are people who will need help with their storage. You don't have to be part of the solution, if you don't want to be.
--timbu2
Re:My favorite quote (Score:1)
Storage management is a strong suit for EMC. Data distribution is a weak one. If your host is directly connected to the box you're golden, but apart from the world's worst NAS device (the Celerra) EMC currently does very little to get the data out of the box and to where it might actually be needed. That's where things like OceanStore can pick up the slack.
Data Management Issues (Score:2)
As one of the graduate students working on OceanStore, I should add a little to this discussion.
Your point about data management being more expensive than the storage itself is absolutely correct. OceanStore addresses this issue in several ways:
First, we use replication and coding algorithms to ensure the integrity and durability of data. Documents that are actively being written to are managed by a group of servers participating in a Byzantine fault-tolerant algorithm. This ensures that despite machine failure or compromise (of up to approximately a third of the machines), your data is safe from loss and corruption. It also provides availability, since from the algorithm's point of view a failing server and an unavailable server are the same.
Data that is not actively being written is stored in Erasure-coded form and spread across the system. A rate N Erasure code breaks an object into Nb pieces, where b is the number of blocks in the object. If any arbitrary b of these pieces can later be recovered, the entire document can be reproduced. For example, with a rate 2 Erasure code, a 1 MB document will be broken into a number of blocks totaling 2 MB in size, such that if any 1 MB of them can be recovered, the whole document can be reproduced. Since each block can be stored on a different server, this gives tremendous durability to data. It also takes nice advantage of the fact that storage is cheaper than the management of storage. I should also mention that we include algorithms which verify the integrity of the reconstructed data.
Second, OceanStore has an introspection system which manages the placement of data throughout the system. While replication and coding keep data safe, introspection moves data around for optimal locality. If your data is across the world from you, you may not care that it is correct or durable, since it takes so long to get at it. Introspection uses pattern recognition techniques to discover what data is important to you and move it or cache it near your current location. This removes the necessity of paying administrators to discover this information and move the data manually in order to improve the performance of the system.
Finally, in order to locate all of this constantly moving information, OceanStore employs a two-tier location system which provides fast access to nearby data and availability to far-away data.
Our recently published paper, OceanStore: An Architecture for Global-Scale Persistent Storage describes these issues in more detail and can be found on our publications page [berkeley.edu].
Sean
How does this differ from Freenet? (Score:2)
On further examination - this basically looks like the product of someone who looked at the Freenet design, didn't understand it, and tried to reimplement it.
--
Re:interesting but... (Score:2)
Re:Where's the story? (Score:1)
Our recently published paper, OceanStore: An Architecture for Global-Scale Persistent Storage describes the system in more detail and can be found on our publications page [berkeley.edu].
Sean
Re:First thing a script kiddie will do... (Score:1)
You pay for the storage you use in OceanStore. Read the paper [berkeley.edu], especially the part about Responsible Parties and the utility model. If you keep uploading random data into the system, you will end up with a large OceanStore bill at the end of the month.
Also, someone mentioned banding together with a bunch of friends and creating a little private OceanStore of their own. This is a great idea, and one that I (personally) am very fond of. Each of you give up two-thirds of your 200 GB disk (they'll be here in no time), and you get reliable, fault-tolerant, highly-available storage in return. In this case, if someone fills up the shared space, you and your other friends kick him out of the group.
Finally, you could take a MojoNation [slashdot.org]-type approach an introduce an arbitrary currency to pay each other for storage.
Sean
Re:RAID-2001? (Score:1)
But backing up data should be COMMON SENSE! (Score:2)
Backing up your data should be common sense - unfortunately it isn't, but it isn't that hard to find information on what a backup is, or what it is for. I cannot understand why people simply think that when data is put into a computer, it will always be there (I guess they think that high-tolerance mechanical devices never wear out)? The majority of people clearly do not understand the power and nature of the tool they are using. It is almost like they expect their car to run forever without an oil change...
Oops - I forgot - some cars can now go for a damn long time without an oil change, while the emmisions/engine control computer reconfigures everything, while the engine wears down - until one day it does break. Manufacturers started making these 100,000 mile cars because people are either too stupid or lazy to have periodic maintenance done, instead opting to "buy" a new car every 3-5 years (and perpetually paying for the vehicle, or worse, LEASING it). Is it that hard to take a car in to have the oil changed, or to do it yourself? Brakes, same thing (the number of times I have heard metal-to-metal brake wear is appalling - how those people stop at all is a wonder) - it really isn't that difficult to replace one's own brakes on a car (though drum brakes do tend to be a bitch).
I don't think everyone needs to know everything about their computer - but they should have common sense about it, simple maintenance, care, and troubleshooting at minimum. Backups are a part of this.
Worldcom [worldcom.com] - Generation Duh!
Deniable crypto (Score:2)
Deniability is pretty tricky stuff. Of course deniable crypto systems should work against a judge, who can't punish you just because they suspect you're holding back but can't prive it. In theory anyway.
--
spoke too soon (Score:2)
I am still rather surprised that they don't mention Freenet anywhere in their Related Work section.
--
I agree entirely if you can enforce the payment. (Score:2)
Having been involved in user billings from a variety of angles, I can tell you that sending a bill has little effect in such cases. Collecting money in advance, decrementing the account, and chopping the service off when it reaches zero does work, provided you can adequately tie the user of the service to the account being billed. Otherwise owner of said account calls up about the bill and gets a credit if you can't really nail this down.
Arbitrary currencies are an idea that I like, Especially if the currency is tied to a hardware artifact that can be tested for presence, say, you add another eighty gig of storage to the public pool, you get another x storage credits per month.
Prioritizing... (Score:2)
Of the data backed up, it should be reviewed, organised, and prioritized to get rid of the least important stuff, keep the important stuff - then put it where it would be best served. Most of my data goes onto ZIP disks, as those fill I go into an organization mode, build an image, and move it to a CDR, then wipe the ZIP disks for more. The ZIP disks are mainly there for convenience, not permanence (not that I expect CDRs to be permanent or anything). Some stuff I resign to the "a copy can be found on the net" bin - and get rid of. Other things I hold a backup only on the hard drive and a ZIP disk (like my web sites), because they change (ir)regularly, and there will always be a copy somewhere. I don't consign these to a CDR, unless they are dead sites that I have moved off to an archive.
I guess one of the good skills in backup is to be organized...
Worldcom [worldcom.com] - Generation Duh!
Re:cloud full of digital pictures (Score:2)
I have 27,000+ pictures that I have taken in the past 3 years, all backed up on CD. Everything adds up to about 11 Gigabytes, which is only 1/4 of my new $200 45Gb Maxtor ATA100 drive. I use ThumbsPlus [cerious.com] to organize things, and it does a great job, doing the thumbnails, tracking keywords, all in an Access97 compatible format.
Ok.. here's the math.... Storage for 27,000 photos... $50. The backup on 24 CDs is about $12 of media.
--Mike--
PS: Yes, I still have ALL of my old floppies, and a 5 1/4" drive that can read them. Soon I'll put all that on my HDD, and back it up to a CD or two.
Re:what about the privacy issues? (Score:2)
Duh - it's a grant-funded graduate student project (Score:2)
Unlike sourceforgeware, however, once you talk people into giving you the cash, then you *do* have to work on the project. It doesn't necessarily have to succeed at its original goals - if you knew what the results were going to be, it wouldn't be *research*.
Re:Prioritizing... (Score:2)
The programmer and web designer both sound very disorganised - if it is a work environment. There should be only one place for the code, all others would be mirror image backups. Anything done at home should be seperate from the work environment. The work environment code should be on a server that can be backed up by the admin as part of the backup process. The user is responsible to put code that needs to be backed up in a place to back it up. They may have to back it up themselves.
You are right in that humans will do whatever they want, regardless of what you tell them. A geek without driving experience though is not likely to tackle a porche for a first lesson, they are likely to try something slower and simpler. But if they like driving, they will learn what it takes to maintain a car, and work up to that porche. The reverse isn't true - most people look at computers as something that should "just work", like a TV, and not even attempt to learn more about them as time goes by. Computers will never be TVs (barring some big advance in AI). We all do have to learn - problem is so many people think that computers don't require this - and never ask questions to learn more about why they should be making backups, or worse, asking for advice, then forgetting what to do 10 minutes later.
Worldcom [worldcom.com] - Generation Duh!
OceanStore uses a legal solution (Score:2)
You are right that OceanStore is not fully distributed, but so what? It seems like it can actually provide stonger guarantees than e.g. Freenet.
Mojo Nation looks like 90% of OceanStore... (Score:2)