Slashdot Log In
MojoNation ... Corporate Backup Tool?
Posted by
Hemos
on Thu Jul 18, 2002 03:44 AM
from the hives-of-bees-making-honey dept.
from the hives-of-bees-making-honey dept.
zebziggle writes "I've been watching the Mojo Nation project off and on over the last couple of years. Very cool concept. While taking a look at the site recently. They've morphed into Hive Cache a P2P corporate backup solution. Actually, it sounds like a great way to use those spare gigs on the hd."
This discussion has been archived.
No new comments can be posted.
MojoNation ... Corporate Backup Tool?
|
Log In/Create an Account
| Top
| 122 comments
| Search Discussion
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Interesting, but ... (Score:3, Interesting)
Re:Interesting, but ... (Score:4, Interesting)
Do you mean unrestricted access? I don't think this is talking about using Joe Foo's kaaza shared folder to store your company's backup data - it's using unused disk space on the company network, and the web sites states that the backup mesh is encrypted, so unauthorised users may have the file on their disk, but they can't access it.
Looks to me like all the criterea of the DPA are covered.
Fundamentally flawed (Score:3, Informative)
50 PC's in your Intranet, each with a 20GByte disk. Thus your backup need is a cool 1000 GByte, if the disks are all fully filled and fully backed-up...
For this concept to work you can see that you need to exclude every copy of Dos95/Office from being backed-up. The basis of P2P is the the service users are also the service provides, thus every participating node needs free HD space. Depending on the crypto overhead and your non-backup portion, you still need a lot of free space for this concept. What is the added value above a reduntant RAID server? Is the total cost of ownership really lower?
MojoNation proposed an awsome concept with their virtual P2P credits. However, this idea seems to suggest that P2P technology increases you HD size, it does not!
Just my 5 EuroCents,
J.
Sounds tasty! (Score:2, Funny)
But seriously, I can imagine that it will only be good on larger scale corporate networks with lots of "enterprise PCs", as they put it...
Mojo Jojo? (Score:1, Funny)
Security? (Score:2)
But besides security, I see it as a solution in search of a problem. It's easy to back-up one system to another with the current tools, and it would happen in a much more simple, organized, and controlled manner.
I hope their software is better than their grammar (Score:1, Offtopic)
Seriously - if someone can't make the effort to check their grammar before posting a web page... - oh, hang on, I'm reading slashdot - nevermind...
cLive ;-)
GPL? (Score:4, Interesting)
The problem with Open Source? (Score:2, Interesting)
Mojonation and backups (Score:5, Interesting)
P2P falls into two categories nowadays, file sharing (FastTrack/Kazaa, Gnutella/Gnucleus-Shareaza-Limewire-Bearshare, Edonkey2000) or publishing (Freenet and Mnet/Mojonation). Like Freenet, Mojonation was more of a publishing network - users publish data, it gets broken into little chunks, encrypted, and then sent out to other computers, and you receive other people's encrypted chunks on your computer making you a "block server". Content trackers and Publication trackers kept track of the meta-data and where the blocks were, and metatrackers kept track of where the trackers (also called brokers) were. I chatted with zooko, one of the developers, on IRC, he was cool and the ideas were very interesting. Like many dot-com stories, it was ahead of it's time in many ways. They converted Mojonation to the open source MNet [sourceforge.net] , whose CVS tree you can peruse. A lot of it is in Python, a language I do not know.
The wasted disk space on workstations (and servers) is something thought about by many, especially in large organizations with large networks. My last company began implementing SANs, so that less disk space would be wasted, and the centralization of disk space allowed for greater redundancy and easier backup. They also ran low priority (nice'd) distributed.net [distributed.net] processes across the whole network on non-production machines. You can take a guess about how large the network is by seeing that they're still ranked #22 without submitting any keys for a year.
Technical Information (Score:5, Informative)
For security reasons we absolutely want to encrypt and sign everything stored on the other computers. There is nothing tricky about this part, the usual cryptography can be used without modifications. This is not going to waste any significant amount of storage space or network bandwidth. But it will require some CPU cycles.
The other not so trivial part of such a system is the redundancy. Reed-Soloman would be one type of redundant coding suitable for the purpose. Parchive [sourceforge.net] also uses this coding.
I know some implementations are limited to at most 255 shares, but for performance reasons, it is probably not feasible to use a lot more than that anyway. I expect the Reed-Soloman code to be the most CPU hungry part of such a system.
We need to choose a threshold for the system, I see no reason why the individual users cannot choose their own threshold. If one user want to be able to reconstruct data from 85 shares, there need to be three times as much backing storage as the data being backed up.
The first approach to storage space would obviously be, that each user can consume as much as he himself makes available to the system. I'd happily spend the 10GB harddisk space needed for two backups of my 1.5GB of important data with a factor three of redundancy. This would if done correctly give a lot better security than most other backup solutions.
One important aspect you may never forget in such a system is the ability to verify the integrity of backups, I guess this is the most tricky part of the design. Verifying with 100% security that my backup is still intact would require downloading enough data to reconstruct my backup. However verifying with 99.9999999% security could require significantly less samples to be made. Unfortunately here the 255 shares can be a major limitation, the larger the number of shares gets the smaller the percentage of data we need to sample gets. I don't wanna do the exact computations right now, but if 18 randomly picked from the 255 shares are all intact, we have approximately the 9 nines of security that there are indeed 85 intact shares of the 255. So we have indeed limited the network usage by almost a factor of five.
If we want:
- Higher security
- Less network usage for verifications
- Good network performance even in case of a few percent of lost
shares
we need more than 255 shares of data. There is no theoretical limit to the number of shares, but the CPU usage increases.What the system also needs is migration of data as users join and leave the system, and a reliable way to detect users responsible for large amounts of lost shares. Creating public key pairs for each user is probably necessary for this. I think this can be done without the need of a PKI, a user can just create his key pair and then start building a reputation.
Spare Gigs? (Score:1)
While I sit here wondering whether Moore's law will ever allow the supply to truely catch up and surpass demand, I hear a Slashdot post talking about "Spare Gigs".
Look, if you want to be dumbasses about HDD purchases and will only need 5 GB of space ever, in your life, buy a 5GB drive at some obscene seek speed, and then send
Wonder what silly patent it infringes... (Score:1)
Method and apparatus for allowing disaster recovery from loss of a file by storing it in two places at once.
"Backup" you say? (Score:1, Funny)
Disaster Recovery (Score:3)
Say that one of the companies in the WTC had done this. Sure, they woulda had backups when a server blew up, but after the entire building was destroyed, they would have had nothing.
You never want to put ALL of your marbles into local backups.
I don't get it (Score:1)
What if some secretary, who has half of the companies emailed "backed up" on her computer, hoses her machine because like most office drones she's not too computer literate?
How is this secure? Fast? Efficient? I thought the whole idea of a backup was to have those tapes in a safe secure place, not on a computer that's being used by other people.
It's almost how IBM's ADSM Backup works (Score:2, Informative)
Clients are installed on the Hosts Enterprise Wide. These can be a mixed platform. AIX, HPUX,
Linux, Windows NT (Cough), Mainframe S/390 (VM)
and so on. The Host running ADSM server has a ton of disk space... a snapshot is taken across the
network to the ADSM server DISK from the Client filesystems to be backed up. The snapshot gets backed up to tape while the snapshot is taken of the next host.. and so on..
This works great across a fast network.
But.......
I use Amanda to backup my Linux Servers at home.
It works in almost the same manner.
Spare gigs? (Score:1)
Yes, but... (Score:2)
The idea of using a distributed/replicated data store such as MojoNation or OceanStore for backup seems cool on the surface, and from a near-term customer-acceptance standpoint it definitely has advantages, but from a long-term technical standpoint one question does arise. If you have such a data store available to you, with appropriate performance and security characteristics, why would you store stuff locally (except for cached copies, which are evanescent rather than authoritative) at all? The ultimate goal of research in this area is not to facilitate backups but to make them unnecessary; what this project is doing is really a sort of "side trip" from that goal.
Spare gigs? (Score:1)
Mango Medley 97 did this 5 years ago. (before p2p) (Score:2, Interesting)
http://www.mangosoft.com/news/pa/pa_0009_-_INFO
It was true Peer to Peer before it was a buzzworld. Basically it would pool space from up to 25 PCs and create an M: drive. Here's part of the article:
Mango pooling is the biggest idea we've seen since network computers
By Info World
Mango, in Westborough, Mass., is not your average software start-up. In 30 months the company has raised $30 million. Its first product, Medley97, has shipped, transparently "pooling" workgroup storage.
And someone at Mangosoft really knows the difference between features and benefits.
But it's not the benefits of Medley97 pooling that interest me. What's interesting are the features and long-term potential of Mango's underlying distributed virtual memory (DVM). Mango's pooling DVM is the biggest software idea since network computers -- perhaps since client/server -- and Microsoft had better watch out.
According to Mango, Medley97 offers transparent networking that's easy to use, fast, and reliable (not to mention secure and high fiber).
Windows users working together on a LAN can share files in a pool of their combined disk storage. Every pooled PC is both a client and server.
Go ahead and drop Medley97 into any PC you want to pool. Medley97 installs, checks configuration, and updates required Windows networking software. The product adds the PC's storage to the pool, giving you a shared, fast, and reliable network drive, M:/, which is available on all pooled PCs. For this you pay Mango less than $125 each for up to 25 PCs.
Re:Mango Medley 97 did this 5 years ago. (before p (Score:4, Interesting)
The basic premise behind the product was that when someone copied a file into the Medley drive the data pages were instantly "duplexed", meaning that a second copy of a page was made elsewhere in the network. If a node in the network went down causing only one other computer to have a copy of the page, Medley would automatically reduplex, causing the single copy of the page to be propagated to another node in the network. The basic promise of Medley was availability and fault tollerance on a P2P level.
Very cool concept but the product had a number of severe flaws that are probably obvious to the average slashdot reader.
The best thing I can say about working on Medley was that it was an opportunity right out of College to work with a number of incredibly excellent engineers on a complex and very interesting problem. Unfortunately, the idea was probably 5 to 10 years ahead of its time.
Not for Backup in Corporations (Score:1)
It might be a cute technolgy and has its use, but id never rely on it for that critical of a function. That is what san/tapes/dvd/etc are for.
Off-Site? (Score:2)
My employer uses a two SANs from Xiotech, one off-site (actually, that's still being implemented), with two off-site servers to support us should our server room spontaneously combust. All of our employees are encouraged to store any and all information on the network drives. These drives get partial backups each night MTWH (any files that changed from the previous day), and a full backup on Friday, and all the tapes are stored off-site. If a user had data s/he wished to save on the HDD and their PC is reimaged, they're SOL, and they know that from the beginning.
At the same time, each user has a 20GB or larger HDD that is essentially wasted because of this. Of course, no one in this organization could have 20GB of reports and text documents, etc.
fragments (Score:1)
the family tree of Mojo Nation (Score:4, Informative)
Mojo Nation was conceived by Jim McCoy and Doug Barnes in the 90's. At the end of the 90's they hired hackers and lawyers and started implementing.
Their company, Evil Geniuses For A Better Tomorrow, Inc., opened the source code for the basic Mojo Nation node (called a "Mojo Nation Broker") under the LGPL.
During the long economic winter of 2001, Evil Geniuses ran short of money and laid off the hackers (the lawyers had already served their purpose and were gone).
One of the hackers, me, Zooko [zooko.com], and a bunch of open source hackers from around the world who had never been Evil Geniuses employees, forked the LGPL code base and produced Mnet [sourceforge.net].
Now there is a new commercial company, HiveCache [hivecache.com]. HiveCache has been founded by Jim McCoy.
BTW, if you try to use Mnet, be prepared for it not to work. Actually the CVS version [sourceforge.net] works a lot better than the old packaged versions. We would really appreciate some people compiling and testing the CVS version (it is very easy to do, at least on Unix).
It would be really good if someone would compile the win32 build. We do have one hacker who builds on win32, but we need more.
What's the patent on or how old is Hive Cache? (Score:2)
I've also seen mention in other comments that this project has been around for awhile in open source form and has only recently been corporatized. Their Sourceforge page [sourceforge.net] has been around since 2000-07-17. Is it potentially older than this as well?
Quick reponse from HiveCache (Score:2)
I will try to answer as many of the good questions and points of discussion that have been brought up as I can over the next few hours, but I wanted to shoot out a quick overview of what HiveCache is to try to set the story straight here.
First of all, HiveCache is an enterprise backup utility that uses a parasitic peer-to-peer data mesh as its backup media. Simple enough really. The goal of the software is not to replace tape or other offline backup tools, the goal is to serve as an alternative tool for users to make most file restoration requests ("hey, I accidentally deleted my Powerpoint presentation fo the big meeting that starts in 30 minutes...") a user self-help operation rather than something that needs IT assistance. Users restore most files via the p2p mesh and tape/CD-R is only needed for really old stuff or if the building burns down.
The HiveCache distributed online backup system is currently targetted at small to mid-sized enterprises (100-1000 seats) as a way for these companies to increase the ROI on existing IT investment (they already paid for the disk space, so why not use all of it) and to decrease the burden that daily backup and restore operations place upon IT staff. Right now the clients are win32 but agents that serve up disk space to these clients from OS X and Unix hosts are also available. By using good error-correction mechanisms it is possible to maintain five "nines" of reliability for retrieving any particular file even if 25% of the network drops offline. As the backup mesh grows larger reliability keeps increasing while the data storage burden for adding a new node drops (because the level of redundancy among the nodes grows.)
Lastly, the relationship between HiveCache and MojoNation. Basically, there are two branches off of the work HiveCache (nee Evil Geniuses) did on MojoNation, one early branch went on to become the backup product, a later fork pared off some of the non-essential bits (payment system, etc.) and became MNet. The MojoNation public prototype helped to work out the kinks of the data mesh but for the last year and a half of the life of MojoNation most of our internal coding effort was on behalf of this other project which shared some back-end components with the LGPL codebase we were also supporting. For those who complained earlier that the MojoNation user experience sucked I must humbly appologize, we were spending the cycles working on a different UI. Since the MojoNation project went into hibernation on our side, a former MojoNation coder and several other very sharp people have been continuing the MNet project [sourceforge.net] based upon the open codebase (and with a much nicer UI than we ever provided for MojoNation.) I do appologize if the patent and licensing language appears a bit heavy-handed, it was a cut and paste job from some email with legal counsel and will be made clearer this weekend when the site is updated.
ObPlug: We are still seeking a variety of enterprise environments for our upcoming pilot test and in addition to getting to experience the benefits of the HiveCache system for your company you will also be able to purchase the Q4 release version at OEM prices! Sign up now by sending mail to pilot@hivecache.com [mailto]
Jim McCoy HiveCache, Inc.
Re:Patents Pending on Mojo Nation? (Score:1, Interesting)
As far as patents go... well, according to even posts here, this was a rather unique approach to a p2p program. However, it HELPS to actually wait to see (or research) the patent before you actually go trying to dispute it or find prior art refuting it. This isn't one-click-of-your-mouse, you know. Not to mention, they haven't threatened to go after anything, I'd say they were even supporting the LGPL fork.
As it states.. "the patent-pending technology of MojoNation" click on the "MojoNation" hive-hex and you will find all the links and relevant information. (Or at least the fact that some of it will be available soon.) So, yes, you are misunderstanding.
Are you interested in taking away the ability of an OSS (and P2P) contributing company to defend their ideas (and work, and money) from corporate interests that might be interested in stealing their ideas? [No arguments from those of you interested in dismantling the patent system entirely -- I feel that way sometimes, too; we're talking level playing field right now.]
MojoNation didn't run at all! (Score:2)
Somehow this experience (installing MojoNation and trying for a few days to figure out what it was actually supposed to do) does not make me eager to try their latest product. Give me a huge hard disk and rsync, please!
Incredible Vanishing Website (Score:2)
- When, and why, did it become 'unavailable'?
- What's taking them so long to make it available again? (Or won't HiveCache work for them to retrieve their old files?)
Seriously, people, If you want to revamp your entire website, you just before you start editing it, and set up all your new stuff in a separate directory hierarchy (like www.mojonation.net/hivecache/*). That way all the old URLs still work.The only reason I can think of why any of the old stuff is unavailable is that they're still trying to figure out what they want to keep that way.
Re:You have spare gigs on your hard drive? (Score:1)
Why did that get a -1 score? In this discussion redundant is supposed to be a positive word.
Re:Mojo Jojo! (Score:1)