Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
Programming IT Technology

MojoNation ... Corporate Backup Tool? 122

zebziggle writes "I've been watching the Mojo Nation project off and on over the last couple of years. Very cool concept. While taking a look at the site recently. They've morphed into Hive Cache a P2P corporate backup solution. Actually, it sounds like a great way to use those spare gigs on the hd."
This discussion has been archived. No new comments can be posted.

MojoNation ... Corporate Backup Tool?

Comments Filter:
  • Interesting, but ... (Score:3, Interesting)

    by King Of Chat ( 469438 ) <fecking_address@hotmail.com> on Thursday July 18, 2002 @04:49AM (#3907266) Homepage Journal
    I'd like to know how this fits in with Data Protection legislation (eg UK DPA).
    • That's an interesting point. If someone asks for a correction under the DPA (eg listed as having wrong number of legs) finding all the incorrect copies could be a total nightmare.
      • Not forgetting that if the information is unencrypted then it may be possible for unauthorised people to read personal information.
        • The information is not 'unencrypted'.. RIGHT THERE [mojonation.net] on the front page of the site. "Strong cryptography protects these files from prying eyes..." The data is encrypted, then chopped up into little chunks, and then the data (and share map) is spread out amoung multiple hosts (like handing out pieces from a broken bottle). The amount of redundancy you want for your little data-cubes is also selectable.
    • by Albanach ( 527650 ) on Thursday July 18, 2002 @05:48AM (#3907356) Homepage
      Am I missing something? This is jsut a backup - why would that present a problem under the DPA. You update the databse and the backup is corrected as it is updated, just as would happen in any enterprise situation.

      Do you mean unrestricted access? I don't think this is talking about using Joe Foo's kaaza shared folder to store your company's backup data - it's using unused disk space on the company network, and the web sites states that the backup mesh is encrypted, so unauthorised users may have the file on their disk, but they can't access it.

      Looks to me like all the criterea of the DPA are covered.

    • I'd like to know how many people actually follow the DPA in the first place.
  • Fundamentally flawed (Score:3, Informative)

    by pouwelse ( 118316 ) on Thursday July 18, 2002 @05:15AM (#3907309) Homepage
    It is always great to see applications of P2P technology. But let's do the math on this one.

    50 PC's in your Intranet, each with a 20GByte disk. Thus your backup need is a cool 1000 GByte, if the disks are all fully filled and fully backed-up...

    For this concept to work you can see that you need to exclude every copy of Dos95/Office from being backed-up. The basis of P2P is the the service users are also the service provides, thus every participating node needs free HD space. Depending on the crypto overhead and your non-backup portion, you still need a lot of free space for this concept. What is the added value above a reduntant RAID server? Is the total cost of ownership really lower?

    MojoNation proposed an awsome concept with their virtual P2P credits. However, this idea seems to suggest that P2P technology increases you HD size, it does not!

    Just my 5 EuroCents,
    J.

    • by Anonymous Coward
      (1) Read the site at http://www.mojonation.net/Products.html, it already removes redundant data.

      (2) Note that encrypted data can be compressed (and should be, to reduce entropy, and result in stronger encryption). Thus, those large .doc files get compressed down.

      (3) Your doomsday estimates of size needs are way off. Exactly how many megs of unique files does each user on a corporate lan need? Most of the files are identical DLLs and OS, which this application already handles.
    • For this concept to work you can see that you need to exclude every copy of Dos95/Office from being backed-up. The basis of P2P is the the service users are also the service provides, thus every participating node needs free HD space. Depending on the crypto overhead and your non-backup portion, you still need a lot of free space for this concept. What is the added value above a reduntant RAID server? Is the total cost of ownership really lower?

      No, your reasoning is what's flawed ;). IMHO with this system, you won't need that RAID server at all. A local copy of the installer for your favourite OS should be about the only local software you'll need. The rest is divided on all your peers' computers. (Could this work for network installations of e.g. Office?)

      I do wonder about security, and about using multi-user files like databases. What if I do a bulk INSERT and everybody else is shutting down their PC's?
    • You forgot the overhead created from replication. I think Hive is a concept beneath regular tape-based backups. It's purpose seems to be faster recovery and versioning; not replacement of backup solutions.

      If you'd use this concept for the user's workareas with documents and maybe configuration data from your servers and have a way of transparently access the p2p backupped data, this could indeed be of benefit for your company.

      Cu
    • From the website
      As an enterprise IT network grows the amount of redundant data stored on PCs increases significantly. Most PCs have the same applications and OS files, HiveCache takes advantage of this fact to decrease the amount of distributed storage needed for the backup network. By not requiring hundreds of redundant copies of the same MS Word executable and Windows DLL files found on each enterprise PC to be stored multiple times, the HiveCache backup mesh requires only a fraction of the data storage space that would be required by an equivalent tape backup solution.
      In other words, in a coperate enviroment it only backs up each file once - meaning only different files you have on each machine - i.e. the data files and configuration files - need be backed up after the first machine is backed up.
      • I'd like to know how it determines when files are the same. Checksums? File names? Complete file pathnames? I also didn't see what happens when one node gets infected with a virus. Does it then propagate back when being restored? (Probably. This last is the least of my concerns).

        • I should imagine that they use checksums, otherwise they wouldn't be able to work out when files were the same or not.

          When one node gets a virus then the checksum for infected files will change meaning that those files will become "unshared." A properly working hive then should have a version of that file with the orginal checksum still shared across all the uninfected hosts, plus a backup of the infected file for those hosts that have been infected.

          If you restore from an infected file then you'll get the virus back. This is the nature of backups.

        • From what I remember about Mojo Nation:

          Each node would store encrypted chunks of data; without a map of the chunks that make up a file, it wouldn't be able to reconstruct the original file. So a node could get infected by a virus, but the virus couldn't infect the chunks. If you put a file together from chunks from nodes that are infected, the reassembled file is not infected (unless you backed up an infected file).
    • In a perfectly-managed world, the RAID array would probably be a better choice, because corporations wouldn't have tons of free disk space lying around on workstations.
      I work for a large (~50k+ employees) organization, and the vast majority of our workstations have 1-15GB free on their drives. It almost never gets used because most of the large-volume users are too paranoid to store anything locally for fear of it getting erased.
      I could see something like this being really handy, especially since we do deal in multi-terabytes of disk space for certain applications.
    • Actually I think this is quite clever. For the most part corporate PC's only are using a fraction of the available HD space. Ie. 3.5 gigs out of 20. This systems could allow for multiple copies of all files on multiple computers.
    • Do you really think it's necessary to backup every single workstation ? In just about every business I've worked for, we let the users use CD-R, Zip disks or floppies to back up their documents, and keep the serious backup solutions for our servers. In a building with 1000+ PCs, you only need a few gigs worth of ghost images, not a thousand 20gig disk images. I think this mojo backup thing is perfect for businesses, effectively turning every workstation into a fragment of a SAN-type solution. I just hope there is good redundancy built into the system, so that when Joe H. Consultant switches off his PC, he doesn't break the backup stripe.
    • You've got it wrong. It only backups different blocks. So, if there is one copy of word out there only one copy of word is backed up. Its an awsome solution.
    • The largest amounts of space on my machine are taken up by Microsoft software (at work), Linux software (at home and my work lab), and training material (bloated Powerpoint slideware), all of which have lots of identical copies and a small number of non-identical ones. If the software can recognize these efficiently, that part's an easy win, and the system should be able to manage it well. And it's possible to make much of the administration work automagically by setting up a row of cheap PCs with big disk drives which can serve backups to everybody. If you've done everything well, and have a wide area network you can flood, such as VPN over internet, you get offsite backup for free too.

      Unfortunately, the biggest single file on my system is my Microsoft Outlook Mailbox, which is in a proprietary format that doesn't make incremental backups possible. Since we're laptop-users who go out in the field a lot, we need to keep the mailboxes on our PCs, not on the server. While the best solution is "So don't use Outlook, then",/i>, in a lot of corporate environments, that's an unrealistically lost battle.

      The other big concern I'd have is that, while the system looks really cool for desktop PC users, it's less practical for an environment where everybody uses laptops and on any given day, half the people are out of the office at customer locations or working from home on slow connections - so their PCs are much less usable as a backup mechanism, and may not have the bandwidth half the time.

  • by Daath ( 225404 )
    I think I'll install the client, and start looking for peoples bank records, insurance stuff, wills, payroll info etc :)
    But seriously, I can imagine that it will only be good on larger scale corporate networks with lots of "enterprise PCs", as they put it...
    • Sorry, Mojonation uses pretty strong encryption. You'd have a hard time of it; you'd do better to walk over to the person's computer. ...and why shouldn't it work for a group of open-source developers? Or for the entire Internet?

      -Billy
      • Yeah well, I mean that it wouldn't be such a great idea for small companies with 10 employees or so...
        And I don't think that it would be used on the internet, at least not for important stuff... Open source development might work... Maybe...
    • your signature 403s, btw.
      • yeah it blows :(
        ISP screwed the disk or something, and he's taking a LOOOOONG time reloading it...
        In the meantime, get it here: http://game01.comhjem.dk - it's the second link... :)
  • Mojo Jojo? (Score:1, Funny)

    by Anonymous Coward
    Well.. eventually he was gonna win and take over Townsville..
  • Sounds good, anyone can just leave a sniffer going, and grab all those password files as they go across the wire... Woo Hoo!

    But besides security, I see it as a solution in search of a problem. It's easy to back-up one system to another with the current tools, and it would happen in a much more simple, organized, and controlled manner.
    • hmm... Are you quite sure that it is impossible to encrypt the data stream? If you're using a unencrypted network you can get any password you want anyway. The use of p2p as backups does not change that.

      I also do not think that the passwd db will be treated as any other file.

      If you're using w2k servers I think that they already send out encrypted password files whenever they want.

      But besides security, I see it as a solution in search of a problem. It's easy to back-up one system to another with the current tools, and it would happen in a much more simple, organized, and controlled manner.

      Who needs order and control? ;)
      Actually I think this is a pretty cool backup solution in some areas. I wouldn't use it at my workplace, maybe in a few years.

    • Exactly! You could EASILY sniff all the passwords and important data in the structure.. if it wasn't encrypted before it was chopped up into hundreds of little pieces. :)
  • "When a data is lost on a PC..."

    Seriously - if someone can't make the effort to check their grammar before posting a web page... - oh, hang on, I'm reading slashdot - nevermind...

    cLive ;-)

  • GPL? (Score:4, Interesting)

    by Albanach ( 527650 ) on Thursday July 18, 2002 @05:50AM (#3907362) Homepage
    Looks to me like they've also morphed from being a GPL package to a commercial one, with no mention of source code, but several emntions of patents on the web page.
    • Re:GPL? (Score:2, Informative)

      No, you missed it. Click on the 'MojoNation' Hive-Hex tab [mojonation.net] and you will find a link to the LGPL sites of both the EGTProtocol [sf.net] and the MNET verison of MojoNation. [sf.net]
    • several emntions of patents on the web page.

      I only see mention of a patent (singular) pending on their "MojoNation" page [mojonation.net]. Where else do they mention a patent or multiple patents?

      It looks to me like it is US patent application number 20010037311 [uspto.gov]. I am definitely not a lawyer, but it looks like the patent is on a method for determining how much use each computer gets to make of the system based on what they provide to the system, and not on the concept of P2P backup in general. I certainly hope I'm reading this right because I have my own P2P backup software that I'm about to release and I don't want to run afoul of their. I know there's plenty of prior art for P2P backup in general out there, but most people don't want a drawn out legal battle even if they're right.

  • Companies feel the need to pay for something so when everything goes tits up they have someone to blame - also, companies have credibility... entrust all my corporate data to a package written by x distributed geeks - er, no thanks.

    • "...also, companies have credibility... entrust all my corporate data to a package written by x distributed geeks - er, no thanks."

      Go tell that all the Fortune 500 companies that use OpenSSL for their encryption, or Apache for their Web Servers, or sendmail, or the GIMP (very commonly used by companies designing CG). Open standards are what the The Internet work - we'd all be screwed if TCP/IP wasn't used in the same way by every node trying to talk to each other...

  • by br00tus ( 528477 ) on Thursday July 18, 2002 @06:03AM (#3907384)
    One of the exciting things about p2p are the innovations people come up with. Of course, some innovations are braindead, though take awhile to implement - someone suggested hashing for p2p (Gnutella) in the first thread [slashdot.org] in which Gnutella was discussed on Slashdot, but it's taken over two years for the major Gnutella developers to implement it.

    P2P falls into two categories nowadays, file sharing (FastTrack/Kazaa, Gnutella/Gnucleus-Shareaza-Limewire-Bearshare, Edonkey2000) or publishing (Freenet and Mnet/Mojonation). Like Freenet, Mojonation was more of a publishing network - users publish data, it gets broken into little chunks, encrypted, and then sent out to other computers, and you receive other people's encrypted chunks on your computer making you a "block server". Content trackers and Publication trackers kept track of the meta-data and where the blocks were, and metatrackers kept track of where the trackers (also called brokers) were. I chatted with zooko, one of the developers, on IRC, he was cool and the ideas were very interesting. Like many dot-com stories, it was ahead of it's time in many ways. They converted Mojonation to the open source MNet [sourceforge.net] , whose CVS tree you can peruse. A lot of it is in Python, a language I do not know.

    The wasted disk space on workstations (and servers) is something thought about by many, especially in large organizations with large networks. My last company began implementing SANs, so that less disk space would be wasted, and the centralization of disk space allowed for greater redundancy and easier backup. They also ran low priority (nice'd) distributed.net [distributed.net] processes across the whole network on non-production machines. You can take a guess about how large the network is by seeing that they're still ranked #22 without submitting any keys for a year.

  • by kasperd ( 592156 ) on Thursday July 18, 2002 @06:33AM (#3907421) Homepage Journal
    Since the technical information [mojonation.net] is missing I cannot explain how this particular product works, but I can explain how it could be possible to do this.

    For security reasons we absolutely want to encrypt and sign everything stored on the other computers. There is nothing tricky about this part, the usual cryptography can be used without modifications. This is not going to waste any significant amount of storage space or network bandwidth. But it will require some CPU cycles.

    The other not so trivial part of such a system is the redundancy. Reed-Soloman would be one type of redundant coding suitable for the purpose. Parchive [sourceforge.net] also uses this coding.

    I know some implementations are limited to at most 255 shares, but for performance reasons, it is probably not feasible to use a lot more than that anyway. I expect the Reed-Soloman code to be the most CPU hungry part of such a system.

    We need to choose a threshold for the system, I see no reason why the individual users cannot choose their own threshold. If one user want to be able to reconstruct data from 85 shares, there need to be three times as much backing storage as the data being backed up.

    The first approach to storage space would obviously be, that each user can consume as much as he himself makes available to the system. I'd happily spend the 10GB harddisk space needed for two backups of my 1.5GB of important data with a factor three of redundancy. This would if done correctly give a lot better security than most other backup solutions.

    One important aspect you may never forget in such a system is the ability to verify the integrity of backups, I guess this is the most tricky part of the design. Verifying with 100% security that my backup is still intact would require downloading enough data to reconstruct my backup. However verifying with 99.9999999% security could require significantly less samples to be made. Unfortunately here the 255 shares can be a major limitation, the larger the number of shares gets the smaller the percentage of data we need to sample gets. I don't wanna do the exact computations right now, but if 18 randomly picked from the 255 shares are all intact, we have approximately the 9 nines of security that there are indeed 85 intact shares of the 255. So we have indeed limited the network usage by almost a factor of five.

    If we want:
    • Higher security
    • Less network usage for verifications
    • Good network performance even in case of a few percent of lost shares
    we need more than 255 shares of data. There is no theoretical limit to the number of shares, but the CPU usage increases.

    What the system also needs is migration of data as users join and leave the system, and a reliable way to detect users responsible for large amounts of lost shares. Creating public key pairs for each user is probably necessary for this. I think this can be done without the need of a PKI, a user can just create his key pair and then start building a reputation.
    • The way I see it, you don't need to backup multiple copies of the entire data set to provide redundancy.

      Instead, you do what RAID 5 does, you stripe the data, across multiple peers, with a checksum block on another. This way your data is still safe if one of your peers goes down. More clever striping and checksum algorithms can cope with more than one peer going down, up to some limit.

      If a large number of your peers go down at once, then your data is lost, but that is only likely to happen if something catastrophic happens, such as your office building burning down, or being hit by a tornado. In that case it would be time to turn to your of site backups, as no P2P backup strategy would be of any use.

      It is worth remembering that the whole point of this system, is to get people back to work as fast as possible, if they accidentally loose a relatively small amount of data. It is designed to complement, not replace, an offsite tape backup strategy.

      I hope this helps.

      • The Reed Solomon codes will tend to outperform a simple checksum. In coding theory there are two desirable traits:
        Distance ~ the amount of damage the code can take and still reproduce a perfect copy of the original
        Efficiency ~ size of the code relative to size of the original data.

        In the case of Raid 5 (on say 6 disks) for each 5 bits you have a distance of 2 (i.e. any 1 bit on any drive can get destroyed), and an efficiency of 5/6ths. By clumping data in larger chunks than 5 bits you can get efficiency very close to 100% while at the same time boosting the distance well beyond 2. In fact this is the fundamental theorem of information theory: as the size of the: given any desired distance and any desired efficiency less than 100% the percentage of codes meeting this criteria increases to 100% as the code size (size of the clump) increases. Reed Solomon codes are a particular class of codes many of which tend to be near maximal in terms of distance relative to efficiency ratios with reasonable computational complexity.

        So my point is that the original poster was actually giving a better solution but was assuming more background knowledge. Hope this helps.
    • We wrote an optimized Reed-Solomon library [onionnetworks.com] for Swarmcast that can do up to 65k shares. Its available under a BSD-style license. Also, since Swarmcast is a P2P content delivery system, the library also supports cryptographic integrity checking, so you can ensure that everything is in tact.

      Unfortunately the Swarmcast project has languished after 1.0, but we have started a new project called the "Open Content Network" [open-content.net]
  • I really want to know where these people come from who have "Spare" HD space. I find myself buying a new HDD with at least double the capacity of the last one I bought at most every year. And that isnt to move files over to, that's just new space, which becomes half-filled again within a couple of months, and then eventually is filled by the time another half-year rolls around.. when I start the proccess again.
    While I sit here wondering whether Moore's law will ever allow the supply to truely catch up and surpass demand, I hear a Slashdot post talking about "Spare Gigs".
    Look, if you want to be dumbasses about HDD purchases and will only need 5 GB of space ever, in your life, buy a 5GB drive at some obscene seek speed, and then send /me/ the 80 or so dollars you've just saved yourself by not buying something with "Spare Gigs"
    • Many corporate networks have rules against pr0n and mp3's. : )

    • Simple: In our network we need about 4GB of space for a client system but the smallest we get is 20GB. We have about one order of a magnitude more space on client drives than in our servers...

    • Seriously, I pick up everybody's old hard drives and use 'em. A windows 98 machine needs only a 528 MB disk to be a schweet network client, MacOS needs a little less.

      I store all the big stuff on the network and use linux soft RAID to build big volumes out of small drives. Right now the main server has seven 9GB SCSI-3 drives in a RAID-5 configuration with a single hot spare. At one time I had 15 hard drives, though, because I had eight IDE drives and seven SCSI-2 (all in the 200-600 MB range). There's also a secondary server, used to store backups, that has 13 2.4 GB SCSI-2 drives on old ISA-bus controllers. It runs soft RAID5 also, and linux's (lame) NFS implementation but most of the time it is turned off to save power.

      The down side is it tends to run hot as hell, especially with the IBM SCSI-3s - but since I started running three six-inch fans repinned from 12 to 5 volts it's reasonably cool & quiet. When I replace my furnace next month I'm going to take the gigantic blower our of the bottom and run it extremely slowly in the bottom of the rack, that will put some CFMs in the system!

      I get a fair number of drives from the "technology recycling" bin the state runs out by the wastewater reclaimation plant. CD-ROM drives in the 4-12X range are easily found there too.
    • The average drive is 20 gigs in the new dells we bought. We need about 2 of those gigs to run all the software the support staff need.

      That's 18 gigs of wasted space on every workstation.

  • Patent #233823923
    Method and apparatus for allowing disaster recovery from loss of a file by storing it in two places at once.
  • by Anonymous Coward
    Yeah right - "Uh, I'm just backing up the new [Insert favourite artist] record..."
  • by MicroBerto ( 91055 ) on Thursday July 18, 2002 @08:46AM (#3907726)
    While this is a great idea, most corporations will not want to do this for one big reason -- they should be doing off-site backups anyway, in case a disaster strikes the Corp building.

    Say that one of the companies in the WTC had done this. Sure, they woulda had backups when a server blew up, but after the entire building was destroyed, they would have had nothing.

    You never want to put ALL of your marbles into local backups.

    • Of course you want offsite backups -- but you _also_ want onsite backups. The two serve different purposes.

      You also need to consider that this software can _also_ provide offsite backup services far more efficiently: it knows, for example, to only make one copy of each shared file.

      -Billy
  • Can someone explain to me how Peer to Peer backups (of servers and data, I assume) is a good idea?

    What if some secretary, who has half of the companies emailed "backed up" on her computer, hoses her machine because like most office drones she's not too computer literate?

    How is this secure? Fast? Efficient? I thought the whole idea of a backup was to have those tapes in a safe secure place, not on a computer that's being used by other people.
    • The data isn't stored in only one place. It is stored redundantly.

      So even if half the people in the office shut down their computers you could restore the data.

      And there is a simple way to get around the problem you described BTW. Just have the computer techs disconnect the power button. ;-)
  • Here's how ADSM backsup

    Clients are installed on the Hosts Enterprise Wide. These can be a mixed platform. AIX, HPUX,
    Linux, Windows NT (Cough), Mainframe S/390 (VM)
    and so on. The Host running ADSM server has a ton of disk space... a snapshot is taken across the
    network to the ADSM server DISK from the Client filesystems to be backed up. The snapshot gets backed up to tape while the snapshot is taken of the next host.. and so on..

    This works great across a fast network.

    But.......

    I use Amanda to backup my Linux Servers at home.
    It works in almost the same manner.

  • Who has those?
  • The idea of using a distributed/replicated data store such as MojoNation or OceanStore for backup seems cool on the surface, and from a near-term customer-acceptance standpoint it definitely has advantages, but from a long-term technical standpoint one question does arise. If you have such a data store available to you, with appropriate performance and security characteristics, why would you store stuff locally (except for cached copies, which are evanescent rather than authoritative) at all? The ultimate goal of research in this area is not to facilitate backups but to make them unnecessary; what this project is doing is really a sort of "side trip" from that goal.

    • Good point! I didn't think of it that way.

      But I don't agree that it's a side trip; instead, it's a step on the journey. Right now we keep everything directly on the machines; next we keep the reliable copies on the network and the working copies on the machine; finally the working copies become temporary.

      It's a brilliant short step.

      -Billy
    • Very excellent point. So what you are saying is rather than "backing up everything" just turn every workstation's hard drive on the network into one huge raid and mirrored (many times over) array? I wonder if this is what microsoft has in mind with its "distributed filesystem" deal in one of its upcomming operating systems? Its definately an interesting thought for the future, esp considering the gigabit networks starting to become even more cost efficient.
  • What are these "spare gigs" you speak of. I don't believe I've ever come across anything quite like that...
  • I remember reading an Infoworld article about a software package that does the very same thing, back in 1997. On a whim I was able to find the quoted article at (don't forget to remove the space):

    http://www.mangosoft.com/news/pa/pa_0009_-_INFOW or ld_-_Mango_pooling.asp

    It was true Peer to Peer before it was a buzzworld. Basically it would pool space from up to 25 PCs and create an M: drive. Here's part of the article:

    Mango pooling is the biggest idea we've seen since network computers

    By Info World

    Mango, in Westborough, Mass., is not your average software start-up. In 30 months the company has raised $30 million. Its first product, Medley97, has shipped, transparently "pooling" workgroup storage.

    And someone at Mangosoft really knows the difference between features and benefits.

    But it's not the benefits of Medley97 pooling that interest me. What's interesting are the features and long-term potential of Mango's underlying distributed virtual memory (DVM). Mango's pooling DVM is the biggest software idea since network computers -- perhaps since client/server -- and Microsoft had better watch out.

    According to Mango, Medley97 offers transparent networking that's easy to use, fast, and reliable (not to mention secure and high fiber).

    Windows users working together on a LAN can share files in a pool of their combined disk storage. Every pooled PC is both a client and server.

    Go ahead and drop Medley97 into any PC you want to pool. Medley97 installs, checks configuration, and updates required Windows networking software. The product adds the PC's storage to the pool, giving you a shared, fast, and reliable network drive, M:/, which is available on all pooled PCs. For this you pay Mango less than $125 each for up to 25 PCs. ... for the rest goto the URL.

    • by harshaw ( 3140 ) on Thursday July 18, 2002 @09:45AM (#3908046)
      Yup, I worked at Mango several years ago on the Medley product.
      The basic premise behind the product was that when someone copied a file into the Medley drive the data pages were instantly "duplexed", meaning that a second copy of a page was made elsewhere in the network. If a node in the network went down causing only one other computer to have a copy of the page, Medley would automatically reduplex, causing the single copy of the page to be propagated to another node in the network. The basic promise of Medley was availability and fault tollerance on a P2P level.
      Very cool concept but the product had a number of severe flaws that are probably obvious to the average slashdot reader.
      • The product ran as a driver in Windows 95/98 and NT. Debugging Medley was an absolute atrocity; I remember at one point having as many as 8 windbg windows open attempting to debug some network wide crash problem.
      • Another problem was that in rare cases a crash on a single machine could bring down other machines in the network. Doh!
      • Servers are cheap, disk is cheap. Once the IT administrator realizes that there is significant complexity in maintaining a Medley network, he or she would realize that it is probably cheaper to buy a RAID enabled server with a ton of disk space.
      • Marketing. Mango couldn't market this product to save themselves. At one point we used a rather rotund Male model called "Waldo" to push the product. Very bizarre.... the product was so complex that it took a while for technical people to understand let alone the average user.
      • The 1.0 product was pushed out the door before it was ready.
      • The company couldn't figure out the appropriate distribution chanel; Vars, direct retail, direct sales, etc. Nothing seemed to work.
      • Finally, the product was over-engineered (sorry guys!)

      The best thing I can say about working on Medley was that it was an opportunity right out of College to work with a number of incredibly excellent engineers on a complex and very interesting problem. Unfortunately, the idea was probably 5 to 10 years ahead of its time.

  • Id never trust something like that to *backup* data in MY company.

    It might be a cute technolgy and has its use, but id never rely on it for that critical of a function. That is what san/tapes/dvd/etc are for.
  • It's been said, but how could you do off-site backups with this system? One user said something to the extent of "this will work in any situation short of tornados or the building burning down". Well, what if the building burns down, or a tornado flattens your office? Seems to me you'd be out of business in an instant after losing every piece of data possessed.

    My employer uses a two SANs from Xiotech, one off-site (actually, that's still being implemented), with two off-site servers to support us should our server room spontaneously combust. All of our employees are encouraged to store any and all information on the network drives. These drives get partial backups each night MTWH (any files that changed from the previous day), and a full backup on Friday, and all the tapes are stored off-site. If a user had data s/he wished to save on the HDD and their PC is reimaged, they're SOL, and they know that from the beginning.

    At the same time, each user has a 20GB or larger HDD that is essentially wasted because of this. Of course, no one in this organization could have 20GB of reports and text documents, etc.
    • How do you do offsite backups of every machine in your enterprise right now? Whoops -- you DON'T. You only offsite-backup the servers.

      With this system backing up every PC offsite becomes possible: you simply add a PC to the network with its own daily tape backup, and configure the Mojonation network to store one reconstruction of every unique file on it. Then drive or mail that tape every day/week to your offsite storage.

      Boom -- offsite backup for the whole network, without having to backup multiple copies of the same file.

      -Billy
      • We don't back up individual PCs because no one should be storing information requiring backup on their PC. That data resides on the network drives on the SAN. Some people will create a local copy to speed their work while they work on a file, but that gets copied back to the network when they're done with it. Should a machine go bad, we replace or re-image it with a standard desktop (Windows, Novell client, Office, etc.) Each user also has a user drive (no big surprise there) so they can backup things like their bookmark files and personal backups of documents they wish to preserve. (Various versions before/after conversions, large changes, etc.) Most users though, don't realize this is unneccessary as they can retrieve copies of their files off the tapes. That's time consuming for them, however.
        • You're confusing cause and effect. The reason that you don't back up info that's on user's PCs isn't that there isn't any backupable info; it's that it's too hard right now. If it were easy, you'd be doing it, because you're dead wrong that there's nothing worth backing up.

          For example, I'd say that the entire OS installation is worth a backup -- especially since you'll only have to keep one backup for the entire network (plus a few minor variations). I'd also say that any info which you would have put on the big global drive should usually be on the user's computer instead, so they don't have to contend for bandwidth when working on it (when all they want is safety).

          There are already programs which can backup user desktops -- Previo works very much like what's described here, for example.

          -Billy
          • We do have the entire OS installation backed up. That's what a hard drive "image" is. And when a hard drive is "re-imaged", it is restored from that backup. And we can't allow most people to locally store their files, because most of them are accessed by more than one person. If the files were stored locally, someone's secretary couldn't print their notes, a project manager couldn't compare notes with the owner's representative, etc. Aside from that, there would potentially be a tremendous amount of dupilicated work over time, if every PM wrote their own reports and worksheets, and every Accountant wrote and ran their own reports instead of accessing the up-to-date shared versions of the same.

            I'm not saying this is the case in every organization, but in mine, it is the case that most data does not need to, nor should it be stored locally.
            • We do have the entire OS installation backed up. That's what a hard drive "image" is.

              Of course. But you lose a lot with that versus the more flexible combined backup -- for example, you become unable to track changing uses, and unable to notice when something which shouldn't change does. In a combined dynamic backup, you'll notice quickly that 700 workstations have the same file (word.exe) but two of them have a different file in the same name and place. Why? Could it be a virus infection? Easy to check -- and easy to fix, once you recognise it.

              And we can't allow most people to locally store their files, because most of them are accessed by more than one person.

              This is a CLASSICAL example of putting the cart before the horse. The IS department isn't here to "allow" people to do things in order to make the people do their job; it's there to help them do things as they do their job. You don't FORCE the people to put files on the network in order to share; they do it because it's the easiest way. They don't need the additional motivation of having no other way to ensure a backup; they would do so without that motivation, for the reasons you specify.

              Anyhow, in your original post, you spoke about backing up personal shares, which don't have anything to do with data sharing anyhow. The ONLY reason to have personal shares is to allow backups using old technology.

              -Billy
  • While I was reading the summary.
  • by Zooko ( 2210 ) on Thursday July 18, 2002 @10:55AM (#3908571) Homepage

    Mojo Nation was conceived by Jim McCoy and Doug Barnes in the 90's. At the end of the 90's they hired hackers and lawyers and started implementing.

    Their company, Evil Geniuses For A Better Tomorrow, Inc., opened the source code for the basic Mojo Nation node (called a "Mojo Nation Broker") under the LGPL.

    During the long economic winter of 2001, Evil Geniuses ran short of money and laid off the hackers (the lawyers had already served their purpose and were gone).

    One of the hackers, me, Zooko [zooko.com], and a bunch of open source hackers from around the world who had never been Evil Geniuses employees, forked the LGPL code base and produced Mnet [sourceforge.net].

    Now there is a new commercial company, HiveCache [hivecache.com]. HiveCache has been founded by Jim McCoy.

    BTW, if you try to use Mnet, be prepared for it not to work. Actually the CVS version [sourceforge.net] works a lot better than the old packaged versions. We would really appreciate some people compiling and testing the CVS version (it is very easy to do, at least on Unix).

    It would be really good if someone would compile the win32 build. We do have one hacker who builds on win32, but we need more.

  • Anybody know what their patent is on? I've been writing some P2P backup software myself and was about to release it, so any info on what the patent is (or patents are) would be appreciated. I'm hoping that it's on something very specific which is non-obvious, because my software would be a lot less likely to infringe then. If they patented the concept of P2P backup, though (grrrrrr)....

    I've also seen mention in other comments that this project has been around for awhile in open source form and has only recently been corporatized. Their Sourceforge page [sourceforge.net] has been around since 2000-07-17. Is it potentially older than this as well?

  • "Dude, you guys are a Slashdot story" is not the best wake-up call one can hope for while spending a few weeks trying to finish up the online info prior to a large-scale test :)

    I will try to answer as many of the good questions and points of discussion that have been brought up as I can over the next few hours, but I wanted to shoot out a quick overview of what HiveCache is to try to set the story straight here.

    First of all, HiveCache is an enterprise backup utility that uses a parasitic peer-to-peer data mesh as its backup media. Simple enough really. The goal of the software is not to replace tape or other offline backup tools, the goal is to serve as an alternative tool for users to make most file restoration requests ("hey, I accidentally deleted my Powerpoint presentation fo the big meeting that starts in 30 minutes...") a user self-help operation rather than something that needs IT assistance. Users restore most files via the p2p mesh and tape/CD-R is only needed for really old stuff or if the building burns down.

    The HiveCache distributed online backup system is currently targetted at small to mid-sized enterprises (100-1000 seats) as a way for these companies to increase the ROI on existing IT investment (they already paid for the disk space, so why not use all of it) and to decrease the burden that daily backup and restore operations place upon IT staff. Right now the clients are win32 but agents that serve up disk space to these clients from OS X and Unix hosts are also available. By using good error-correction mechanisms it is possible to maintain five "nines" of reliability for retrieving any particular file even if 25% of the network drops offline. As the backup mesh grows larger reliability keeps increasing while the data storage burden for adding a new node drops (because the level of redundancy among the nodes grows.)

    Lastly, the relationship between HiveCache and MojoNation. Basically, there are two branches off of the work HiveCache (nee Evil Geniuses) did on MojoNation, one early branch went on to become the backup product, a later fork pared off some of the non-essential bits (payment system, etc.) and became MNet. The MojoNation public prototype helped to work out the kinks of the data mesh but for the last year and a half of the life of MojoNation most of our internal coding effort was on behalf of this other project which shared some back-end components with the LGPL codebase we were also supporting. For those who complained earlier that the MojoNation user experience sucked I must humbly appologize, we were spending the cycles working on a different UI. Since the MojoNation project went into hibernation on our side, a former MojoNation coder and several other very sharp people have been continuing the MNet project [sourceforge.net] based upon the open codebase (and with a much nicer UI than we ever provided for MojoNation.) I do appologize if the patent and licensing language appears a bit heavy-handed, it was a cut and paste job from some email with legal counsel and will be made clearer this weekend when the site is updated.

    ObPlug: We are still seeking a variety of enterprise environments for our upcoming pilot test and in addition to getting to experience the benefits of the HiveCache system for your company you will also be able to purchase the Q4 release version at OEM prices! Sign up now by sending mail to pilot@hivecache.com [mailto]

    Jim McCoy HiveCache, Inc.

The amount of time between slipping on the peel and landing on the pavement is precisely 1 bananosecond.

Working...