Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?
DEAL: For $25 - Add A Second Phone Number To Your Smartphone for life! Use promo code SLASHDOT25. Also, Slashdot's Facebook page has a chat bot now. Message it for stories and more. Check out the new SourceForge HTML5 internet speed test! ×

Comment Re:How does it deal with replication latency? (Score 1) 137

You're right that this increases cost for round trips, but it's not nearly as bad for _throughput_ as you imply. Applications pipeline, and TCP is windowed, so you don't have a round trip per packet or message. Above, where you imply a round trip for each row returned from a table, the actual behaviour is that the client sends a request, and the server immediately starts spitting out a large number of rows. There's a delay of on average your checkpoint length/2 for the initial response to arrive (so 25ms if you're checkpointing at 50ms), then the rest of the query arrives at link speed.

Remus is meant for applications that are separated by a WAN/internet link from their clients (internet applications generally have to tolerate this degree of latency anyway). For multi-server cooperative applications (like a web server in front of a database), you could get rid of the network delay by checkpointing both servers and failing them over as a unit. We have some experimental cluster checkpoint code here, but it's not part of this release.

Comment Re:How does it deal with replication latency? (Score 1) 137

I think it'd be interesting to use DRBD with Remus, since it does a nice job with things like resynchronization which you would need when the primary comes back on line. But DRBD gives your VM one shared block device, so as I mentioned in the the SAN question below, Remus would need some form of journal or transaction layer over the block device to be able to roll back uncommitted writes. A shared-storage option is on our to-do list.

Comment Re:How does it deal with replication latency? (Score 1) 137

No, Remus doesn't require shared storage -- just two regular PCs with their own disks will do. It could be used on a SAN, but the disk replication system would have to be modified, to do something like journalling the writes from the primary so that they could be undone on failover up to the point of the last checkpoint.

Comment Re:How does it deal with replication latency? (Score 1) 137

Yes, dirtying a lot of RAM will extend the time between checkpoints, and outbound traffic is buffered until the next checkpoint. I've put it under fairly heavy load and I don't think I've seen more than about 100ms epochs outside of my deliberate dirty-memory-as-fast-as-possible microbenchmark.

Comment Re:Wrong place to put a failsafe? (Score 4, Informative) 137

Split brain is a possibility, if the link between the primary and backup dies. Remus replicates the disks rather than requiring shared storage, which provides some protection over the data. But there are already a number of protocols for managing which replica is active (e.g., "shoot-the-other-node-in-the-head") -- we're worried about maintaining the replica, but happy to use something like linux-HA to control the actual failover.

Comment Re:How does it deal with replication latency? (Score 5, Insightful) 137

I think you're missing the point of output buffering. Remus _does_ introduce network delay, and some applications will certainly be sensitive to it. But it never loses transactions that have been seen outside the machine. Keeping an exact copy of the machine _without_ having to synchronize on every single instruction is exactly the point of Remus.

Comment Re:state transfer (Score 5, Informative) 137

It depends pretty heavily on your workload. Basically, the amount of bandwidth you need is proportional to the number of different memory addresses your application wrote to since the last checkpoint. Reads are free -- only changed memory needs to be copied. Also, if you keep writing to the same address over and over, you only have to send the last write before a checkpoint, so you can actually write to memory at a rate which is much higher than the amount of bandwidth required. We have some nice graphs in the paper, but for example, IIRC, a kernel compilation checkpointed every 100ms burned somewhere between 50 and 100 megabits. By the way, there's plenty of room to shrink this through compression and other fairly straightforward techniques, which we're prototyping.

Comment Re:How does it deal with replication latency? (Score 5, Informative) 137

The buffering I mentioned above means that packet X will not escape the machine until the checkpoint that produced X has been committed to the backup. So when it recovers on the backup, X will already be in the OS send buffer. There's no possibility for misprediction. If the buffer is lost, TCP will handle recovering the packet.

Comment Re:state transfer (Score 3, Informative) 137

FWIW, we have an ongoing project to extend this to disaster recovery. We're running the primary at UBC and a backup a few hundred KM away, and the additional latency is not terribly noticeable. Failover requires a few BGP tricks, which makes it a bit less transparent, but still probably practical for something like a hosting provider or smallish company.

Comment Re:How does it deal with replication latency? (Score 5, Informative) 137

Hello slashdot, I'm the guy that wrote Remus. It's my first time being slashdotted, and it's pretty exciting! To answer your question, Remus buffers outbound network packets until the backup has been synchronized up to the point in time where those packets were generated. So if you checkpoint every 50ms, you'll see an average additional latency of 25ms on the line, but the backup _will_ always be up to date from the point of view of the outside world.

Slashdot Top Deals

The first myth of management is that it exists. The second myth of management is that success equals skill. -- Robert Heller