Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Linux Software

Linux Directory Replication? 8

okie_rhce asks: "With cheap Unix-based server clusters becoming all the rage, what are people using to replicate content to server farms? Networked File Systems can be a single point of failure and tools like rsync are not real time."
This discussion has been archived. No new comments can be posted.

Linux Directory Replication?

Comments Filter:
  • What are you really trying to do ?

    The reason why your question has brought out only trolls and other meaningless posts is that it is so vague that it could be classified as a troll itself.

    How big are these content directories you want to replicate, and how often to they change ?

    If they only change every few hours or so, you can use nfs and a script that does cp -r with no problem. Or rdist. Or CVS. Or ftp. Don't worry about "real time", just take the machine that is being updated out of the pool that might serve requests - don't even bother with that if a briefly half-updated web page doesn't matter.

    Does it matter if a few of the machines are out of date, or should of the machines have the exact same thing ?

    I think if you explain what you are trying to do, you'll get a number of sugestions.
  • ok, how about this, develop 3 or so servers that share a SCSI chain, with 4 or so big ass scsi hard drives. Use IP over SCSI for interserver communication. If one server Fails, have another one go up with that server's ip address in it's place, and then have it page root.

    It wouldn/t be too hard to do, some cron jobs for every 2-3 minutes and shell scripts could do it with no problems.

    -LW
    BTW:
    perhaps multiple scsi busses for more hard drives... and have the IP-Over-SCSI go up and make sure that when a server goes back, that its all ok, and that its not needed immeaditly.

    Also, A File system such as EXT3 would be best for the hard drives... perhaps Raid5 between multiple 50 gig SCSI drives.

    Dont forget a Autoloading DLT4 backup system to backup all your data. Perhaps 4 servers, 3 main servers and one backup server for backing up data.
  • If you are copying content to a local disk, then an NFS server failure just means no updates for a while, you can still serve pages out of the local directory. No biggie for many jobs.

    Of course you would never have a cluster of webservers serving pages off of the same harddisk via NFS -- would you ? That would still bottleneck everything through the NFS server's disk and ethernet card, thus making the whole clustering thing pointless, right ? Unless maybe you were dynamically generating pages and just wanted to spread the cpu load.

    Whatever. This question is too vague. The asker should explain what they want to build.
  • good point about the question being to vague
  • by Cato ( 8296 ) on Monday September 04, 2000 @05:06AM (#807553)
    I recently investigated this for Solaris, and I'm sure some of the products will be on Linux as well. Also, check the Linux-HA page at http://www.linux-ha.org/.

    The basic idea is to do clustering: two machines that share a single (virtual) IP address, in addition to their real IP address. You then have clustering software that detects a failure in the other machine, or in an application, and fails over to the other machine, starting the applications that failed, and doing a gratuitous ARP to bind the virtual IP address to the MAC address for the surviving machine.

    Client apps must be able to re-connect to the server, since their TCP sessions are dropped when the primary server crashes. TCP state failover as in some firewalls (e.g. FW-1 from Check Point) would be very handy but I don't know any OSs that do it.

    This requires a shared disk subsystem - initially this is usually SCSI, with two controllers and software that can handle this. As systems grow, they tend to migrate to SANs (storage area networks), usually based on Fiber Channel - this is very fast, as you might expect, and can be built using FC switches, so your SAN can be redundant, as well as your servers. You would of course need RAID 1 or 5 in your disk subsystem.

    The next step is to do volume replication - this can be nearly instantaneous (you have a choice of synchronous replication, which slows down every update transaction on the main server, or asynchronous replication, which is a little less safe). The trick here is to make sure that the volume replication software can buffer updates during times when the secondary server/disk is not available - otherwise a single failure stops all transactions...

    Finally, global cluster management involves failing over between geographically separated systems - this would require the client apps to know how to switch to a different IP address, though you might be able to rig something up with load balancing technology.

    This is a horribly complex area, as I discovered, and it's not simple to get it right. There are many techniques I have not covered - good sites to read up on are veritas.com, logitech.com, sun.com (search for NDR and Sun Cluster), technet.oracle.com (Oracle focused but covers many options), and of course linux-ha.org.
  • by Cato ( 8296 ) on Monday September 04, 2000 @05:28AM (#807554)
    Forgot to point out that volume replication means between two servers across some sort of network, it's not just mirroring/RAID-0.

    Must...use...preview...button...
  • I believe the original post was quit clear.

    That being said I'll pick at your response point by point:

    >What are you really trying to do ?

    This was clear -- replicate content to server farms.

    >The reason why your question has brought out only trolls and other meaningless posts is that it is so vague that it could be classified as a troll >itself.

    I disagree stongly. I think it was a simple well stated question!!

    And be real! Trolls are gonna pop up no matter what! But to claim his post is a troll because it is vague is just wrong.

    > How big are these content directories you want to replicate, and how often to they change ?

    Does it matter? The original poster gave an example of server farms. But the real question is "what are people using to replicate content?".

    Would it really make a difference if it was 3 meg's vs 10 meg's? And as for the rate of change I don't possibly see how asking "what people are using to replicate content to server farms" requires a followup with "how often does the content change?" The context being if it changes even once a day server farm operators are using "some" method of replication, even if it means manual replication by an administrator!

    >If they only change every few hours or so, you can use nfs and a script that does cp -r with no problem. Or rdist. Or CVS. Or ftp.

    Now your offering a possible answer to his question. Thank you!

    >Don't worry about "real time", just take the machine that is being updated out of the pool that might serve requests - don't even bother with >that if a briefly half-updated web page doesn't matter.

    I would tend to believe if a person or entity took the time to setup a server farm a half-updated web page would be a huge problem.

    However, "taking a machine out of the pool" might be more problematic. What happens when you put it back in? Say you have 50 machines in the "farm", now you have 1 updated and 49 out of date. So what's next? Take another out and update it and put it back? Now the score is 48-2! You see I can gather from his text that "real time" could quite very well be critical for his needs. Or at least "near real time". Actually I think he really means update between all the machines simultaniously. ( the original poster can correct me if I assume wrong )

    >Does it matter if a few of the machines are out of date, or should of the machines have the exact same thing ?

    Duhhhh. If your livelyhood depends on it, it may matter a great deal. Come on now, isn't it obvious that "content replication" means you want your content to be the same on all machines? If not can you honestly give me an example of why anyone would want a server farm to have differing content on it's machines in the context of the original posters question?

    >I think if you explain what you are trying to do, you'll get a number of sugestions.

    I agree. But he didn't say he was trying to "do" anything. He mearly asked a general type of question. Perhaps he could be more clear by being more specific.

    This is NOT intended to flame you! But if you want to hunt troll's pick a real troll!

    Cheers,
  • Sorry to reply to my own message *again*, but where I said logitech.com I meant legato.com...

Today is a good day for information-gathering. Read someone else's mail file.

Working...