Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Businesses Cloud

ZFS Replication To the Cloud Is Finally Here and It's Fast (arstechnica.com) 150

New submitter kozubik writes: Jim Salter at Ars Technica provides a detailed, technical rundown of ZFS send and receive, and compares it to traditional remote syncing and backup tools such as rsync. He writes: "In mid-August, the first commercially available ZFS cloud replication target became available at rsync.net. Who cares, right? As the service itself states, If you're not sure what this means, our product is Not For You. ... after 15 years of daily use, I knew exactly what rsync's weaknesses were, and I targeted them ruthlessly."
This discussion has been archived. No new comments can be posted.

ZFS Replication To the Cloud Is Finally Here and It's Fast

Comments Filter:
  • by MichaelSmith ( 789609 ) on Tuesday December 22, 2015 @04:55AM (#51164043) Homepage Journal

    rsync synchronises files. ZFS synchronises a file system. Of course it is better to work that way because you can transfer just the changed components of a file. Moving a file just changes a pointer, so send the pointer. That sort of thing.

    • If I'm reading this right, ZFS sync opens up one other huge, huge possibility. I had this idea nearly 15 years ago (shortly after Napster), but didn't have the technical expertise to implement it: A distributed redundant filesystem.

      ZFS doesn't think in terms of files. It thinks in terms of blocks, and in a redundant z-volume (similar to a RAID array) it distributes those blocks over multiple virtual devices (vdevs) - you can think of them as disks, but they don't have to be. These vdevs can be a disk
  • VM Replication (Score:4, Interesting)

    by tomknight ( 190939 ) on Tuesday December 22, 2015 @04:58AM (#51164053) Journal

    I was a little unexcited by (although interested in) the article, even by the general speedups until I got to the part about VM replication. This really makes an enormous difference.

    ZFS licensing has kept this as a grey area for me, so I I've largely kept away from deployment (save for an emergency FreeNAS box I needed in a hurry), but I'd clearly benefit from looking here again. Thanks for the reminder.

    Oh, I also appreciate the rsync.net advertisement. Good guys, good service ;-)

    • by Lennie ( 16154 )

      The article did feel like an advertisement.

      They offer a VM with lots of a disk space, is that really that special ?

      I know of at least one that offers something similar:
      https://www.vultr.com/pricing/... [vultr.com]

      I guess not at the same scale and with a bandwidth limit.

      What I think is kind of funny is how people are surprised that ZFS works well for VM-images.

      rsync is meant/optimized for transfering files, not blocks.

      ZFS is meant for transfering filesystem blocks, VM-images are blocks too.

      So ZFS works better than rsync

      • by Bengie ( 1121981 )
        Depends on what you're calling "containers". BSD Jails have been around for a long time, but what Linux calls "containers" are crappy attempts to containerize. The Linux community has this unhealthy "not invented here" syndrome that results in a lot of square wheels.
        • but what Linux calls "containers" are crappy attempts to containerize.

          Not sure what you mean. Jails have been around for a long time, but LXC/LXD containers have almost identical functionality.
          container templates...check
          filesystem snapshot integration (ZFS, btrfs) with cloning operations...check
          resource limits...check
          unprivileged containers...check
          network isolation...more flexible under LXC than Jails, in my opinion
          bind mounts in containers...check
          nice management utilities

          • by Cyberax ( 705495 )

            Only difference I can see really is that LXC doesn't support nested containers...

            It most certainly does. Linux can nest user namespaces to almost any depth.

          • by Bengie ( 1121981 )
            The difference is BSD Jails are entirely separate environments with their own unshared kernel datastructures, and the jail communicates with the host via an API. Linux namespaces is just metadata added to shared environments. Not only is it reduced isolation, allowing for some annoying "leaking", but is more bug prone from increased complexity, and the shared datastructures are naturally more prone to major issues then they arise.

            Security can't be bolted on after the fact, it must be baked into the design
            • The difference is BSD Jails are entirely separate environments with their own unshared kernel datastructures, and the jail communicates with the host via an API. Linux namespaces is just metadata added to shared environments.

              I'm sorry, but this notion is completely wrong. A BSD Jail is a forked process (the "jail process"), which calls the "jail" kernel system call and then executes a chroot. The jail syscall serves to attach the "prison" data structure to the "proc" data structure of the jail process, allowing the kernel to identify the process as "jailed" and treat it accordingly. The isolation of the environments is dependent entirely on the kernel recognizing that the process is jailed and putting the appropriate restrictio

              • by Bengie ( 1121981 )
                BSD Jails do not use anything like chroot. "chroot" is being used as a verb that described the intention, but not the implementation.

                This is how the FreeBSD kernel devs describe BSD Jails. Each jail get's it's own kernel network stack, kernel memory allocator, and almost every other kernel datastructure. They said this is nearly identical to paravirtualization. Breaking out of a jail requires a kernel flaw in both a system call and the paravirtualization layer.

                Think KVM+QEMU, with most of the benefit an
                • This is how the FreeBSD kernel devs describe BSD Jails. Each jail get's it's own kernel network stack, kernel memory allocator, and almost every other kernel datastructure.

                  What you are describing is VPS (Virtual Private System), not Jails. VPS is the successor to Jails, written to address some of the shortcomings of Jails and make them more useful in situations where you want true virtual environments, rather than just the extra security that Jails has to offer. Incidentally, the mechanisms used to implement VPS in FreeBSD are nearly identical to the mechanisms for implementing containers on linux. Here is the relevant description from the whitepaper (http://2010.eurobsdcon.o

      • A little bit. The author as stated in the article is also the author of the sync tool, Syncoid [github.com] and does consulting [openoid.net]. He doesn't really have a horse in the race for the company he's reviewing and his tools are looking to support Btrfs, so he's not necessarily married too ZFS or the solution the company the review is mostly about. Actually pretty nice article to read.

        From some of the benchmarks in the article it didn't seem like rsync had any strength over syncoid, other than his tool requiring ZFS on both en
    • by rl117 ( 110595 )

      There is no grey area with respect to the licensing. It's CDDL, a free software licence. It's 100% Free.

      It might be incompatible with the GPL, but that's a non-issue. The userland tools are fine under this licence. The kernel modules are fine under this licence. Now, it means that the kernel modules aren't going to appear in a kernel release anytime soon, but that in no way makes for any legal problems in using them as loadable modules, today. It works fine from a technical point of view, and it's als

    • Don't let the licensing FUD scare you. Linus has publicly stated that licensing in a case that's a very near equivalent to ZFS' licensing is fine.

      The anticipated problem with the license has always been on the Linux side. The license ZFS is released under doesn't in any way prohibit the ZFS code from being used in other places with other licenses (like the *BSD's). There has never been a concern that using ZFS with Linux violates the ZFS license (and thus could bring Oracle's well-fed lawyers down upon y

  • Charming (Score:2, Insightful)

    Who cares, right? As the service itself states, If you're not sure what this means, our product is Not For You.

    Ah, there's that welcoming open-source community spirit.

    • Re:Charming (Score:4, Informative)

      by greenfruitsalad ( 2008354 ) on Tuesday December 22, 2015 @06:05AM (#51164197)

      there are things in this world that simply aren't meant for participation award winners. so go get offended somewhere else.

      if somebody doesn't know what ZFS replication is, their product clearly isn't meant for them. why bother with explanation to a visitor that has no use for the product/service?

      the attitude of these ZFS people is still quite welcoming compared to some connectivity providers i've dealt with. e.g. bogons.net will just politely tell you to f*ck off if you don't fully understand what you're purchasing from them (dwdm/cwdm rings).

  • It will make you kill your wife
    • by Anonymous Coward

      That was ReiserFS, not ZFS.

    • by Anonymous Coward

      Only after the Russian mail-order bride steals the money from your open source "wealth" to fund her new boyfriend's BDSM hobbies.She actually sounded a lot like my ex, the one with the website on breast feeding with nipple rings.

      And no, I'm not making *any* of this up.

  • by urdak ( 457938 ) on Tuesday December 22, 2015 @05:01AM (#51164065)

    Reading this article, it seems that this "ZFS replication" is very similar to rsync, with one straightforward addition:

    Rsync works on an individual file level. It knows how to synchronized each modified file separately, and does this very efficiently. But if a file was renamed, without any further changes, it doesn't notice this fact, and instead notices the new file and sends it in its entirety. "ZFS replication", on the other hand, works on the filesystem level so it knows about renamed files and can send just the "rename" event instead of the entire content of the file.

    So if rsync ran through all the files to try to recognize renamed files (e.g., by file sizes and dates, confirming with a hash), it could basically do the same thing. This wouldn't catch the event of renaming *and also* modifying the same file, but this is rarer than simple movements of files and directories. The benefit would have been that this would work on *any* filesystem, not just of ZFS. Since 99.9% of the users out there do not use ZFS, it makes sense to have this feature in rsync, not ZFS.

    • by brambus ( 3457531 ) on Tuesday December 22, 2015 @05:16AM (#51164097)
      The crucial difference is ZFS send is unidirectional and as such is not affected by link latency. rsync needs to go back-and-forth, comparing notes with the other end all the time. ZFS send is also a lot faster and more efficient, eliminating entire large portions of the filesystem tree structure that haven't changed without having to read them in. This is not to say that rsync's authors were any less competent coders. ZFS simply has more information available about the filesystem than rsync, so it can make smarter decisions.
      • by urdak ( 457938 )

        The crucial difference is ZFS send is unidirectional and as such is not affected by link latency. rsync needs to go back-and-forth, comparing notes with the other end all the time.

        But this is *not* what the article appears to be measuring. He measured that the time to synchronize a changes were nearly identical in rsync and "ZFS replication" - except when it comes to renames.

        • If you read on a bit in the article, you'll come across the example of daily syncing of VM images across to a backup node. While ZFS send is done in less than an hour, rsync would take north of 7 hours just to read in the local state of the VM image, much less figure out what has changed and send the diffs. This is based entirely on ZFS send's unidirectionality. The critical difference is that rsync needs to trawl the entire local dataset state completely and compare notes with the other box (which also nee
          • In addition, when it comes to VM hosting in the filesystem, ZFS deduplication can offer a significant space savings by deduping all the common files in the VM images (operating system files).

            If you are hosting Windows VMs, this effectively nullifies many gigabytes of storage bloat. This is, of course, a feature of ZFS, and has nothing to do with snapshotting other than the fact that your snapshots will be smaller.

            • deduplication takes an insane amount of RAM and is really only useful for static rarely written datasets, its strongly recommended against for VM images.

              OTOH enabling lz4 compression is recommended - cpu/ram usage is minimal and the compression levels can be quite impressive, plus it can actually improve disk i/o as less data is read/written from disk. I have many VM's with compression enabled, compression usually reduces the image by about 30%

              • Depending on what your setup is and what the requirements are, it's fully feasible to have a 'storage server' where all it's RAM is handed over to ZFS for caching and dedup, and you export via NFS to your VM hosting systems on 10GbE. It adds a touch of latency, but if you can host a hundred machines that don't require super low latency and save 90% of the disk space by only having 1 copy of your server OS (for the most part), then you're probably doing better.

                It's a viable config depending on what the need

        • Renames and changes to large files (VM images were the author's example).

        • by DRJlaw ( 946416 )

          But this is *not* what the article appears to be measuring. He measured that the time to synchronize a changes were nearly identical in rsync and "ZFS replication" - except when it comes to renames.

          Yet this is what the article says. Does he really have to measure read time to the millisecond instead of providing an estimate? How fast can your disk system read off 2TB of information, anyway?

          "Virtualization keeps getting more and more prevalent, and VMs mean gigantic single files. rsync has a lot of trouble

      • Not quite - zfs needs to contact the destination zfs fs to compare with the last snapshot, but that is a very quick process. Once done zfs already knows whats blocks have changed since the last snapshot, whereas rsync has to scan the contents of each file at *both* ends which is where all the time comes in.

        • Not quite zfs needs to contact the destination zfs fs to compare with the last snapshot

          Ehm, no, sorry. No communication with the destination machine is required while generating an incremental send stream. How can I claim this? Well besides being quite intimate with the ZFS source base (and I can point you to the relevant source files if you so desire), just a quick read through the zfs(1M) manpage will mention this example:

          # zfs send pool/fs@a | ssh host zfs receive poolB/received/fs@a

          As you are no doubt aware, pipes are by definition unidirectional. There is no way the zfs receive can tal

    • by Anonymous Coward

      Not exactly.

      rsync will always have to go through the files and check. Trying to identify stuff like renames will obviously make a difference, but as it's only really going to have any sizeable impact when you happen to have lots of renames, but not actual data changes, it's probably not even worth the effort of implementing it.

      ZFS send/recv works at a very low level using the fundamental infrastructure in ZFS that makes snapshots work. When you send an incremental ZFS snapshot it doesn't have to check anyth

      • by urdak ( 457938 )

        Not exactly.

        rsync will always have to go through the files and check. Trying to identify stuff like renames will obviously make a difference, but as it's only really going to have any sizeable impact when you happen to have lots of renames, but not actual data changes, it's probably not even worth the effort of implementing it.

        The rename issue is actually *very* important. It's not likely that you'll have a lot of independent renames, but something very likely is that you rename one directory containing a lot of files - and at that point rsync will send the entire content of that directory again. I actually found myself in the past stopping myself from renaming a directory, just because I knew this will incur a huge slowdown next time I do a backup (using rsync).

    • by geggo98 ( 835161 )
      In principle true, but with one exception: If you already use ZFS for other reasons (e.g. checksums in the file system or transparent compression), it's really nice that you can make backups on the filesystem level with rsync like performance. The backup on the filesystem level keeps all file system specific features intact (e.g. the checksums and the compression). So you can have really fast backups and you can be sure, that when you restore the backup, the filesystem will look exactly as it looks now. So
    • So if rsync ran through all the files to try to recognize renamed files (e.g., by file sizes and dates, confirming with a hash), it could basically do the same thing.

      As a sibling comment points out, rsync does have a mode which handles this. As they don't point out, it is horrendously costly. Making this the default would be a pure idiot move. ZFS has metadata that permits detecting these sort of files, so it is possible to do it cheaply with ZFS.

      What is really wanted IMO is for rsync to detect this stuff and use it when ZFS is present.

      • ZFS has metadata that permits detecting these sort of files

        Side note for your entertainment in case it interests you, the way ZFS actually handles the rename case has nothing to do with trying to follow file name changes. In fact, in order to handle a rename, we don't need to look at the file being renamed at all. The trick is in the fact that directories are files too (albeit special ones) with a defined hash-table structure. ZFS send simply picks up the changes to the respective directories as if they were regular files and transfers those. The changed blocks the

        • Side note for your entertainment in case it interests you

          It does

          The trick is in the fact that directories are files too (albeit special ones) with a defined hash-table structure. ZFS send simply picks up the changes to the respective directories as if they were regular files and transfers those.

          That does seem like functionality which rsync could be enhanced to use. At least, it could be used to more rapidly find duplicates when both ends are using ZFS. rsync ain't going away anytime soon.

          I am interested in ZFS but will probably wait until a Linux distribution makes it trivial to implement. I am past the point where messing around with filesystems seems fun.

      • the scopes of what "zfs send" and "rsync" do are so profoundly different, it's almost silly to compare them. they're at completely different layers of storage stack. when i sync my local filesystem with a remote site (every hour), i sync snapshots, clones, (sub)filesystems while things are mounted and heavily in use. there's also compression and deduplication to consider.

        the rsync feature you suggested isn't possible without a complete zfs rewrite or another layer of abstraction. too costly in either case.

    • by grumbel ( 592662 )

      The biggest difference is that ZFS has full knowledge of the state of the file system, rsync on the other side doesn't, it's stateless, it has to start from zero each time and regather the information on each and every run on both sides, which is a really slow and potentially error prone process (i.e. when files change while rsync runs). ZFS knows what's going on in the filesystem and its snapshots the filesystem at a single point in time, so it thus it can be be far quicker and won't produce inconsistencie

    • by Bengie ( 1121981 )
      The difference between rsync and ZFS is O(N) and O(1). That is the worst case, but ZFS can instantly find the difference between datasets of any size, while rsync has to scan them first. Try rsyncing petabytes of files where many files are constantly being touched, but few changes being made.
    • ZFS replication is for synchronizing file system snapshots. rsync is for syncing some files.

      Entirely different purposes even if they seem the same.

      ZFS encapsulates the entire storage channel. It is your volume manager all the way to your file system. It knows of every single change that occurs, when and where it occurs and what it changed. Sending a ZFS snapshot gets not only the snapshot being sent, but every one in between. ZFS does deduplication, compression, checksumming, and the snapshots stores ev

      • I'm pretty sure there are people using rsync successfully for more than "a couple directories".
        • by rl117 ( 110595 )

          They definitely are. But it doesn't scale well. The time taken to scan the files and their contents on the source and destination system becomes overwhelming. The largest I've taken it to is a few terabytes, consisting of many thousands of directories each containing thousands of files (scientific imaging data). It ends up taking hours, where with ZFS it would take a few seconds. It also thrashes the discs on both systems as it scans everything, and uses a lot of memory. ZFS does none of these things-

    • Another problem is that rsync has to scan the entire file system, calculate hashes and transfer them and then do the same on the other side before it can transfer the difference.

      If you have millions of files and directories that can take significant amount of time. I used to have rsync take a weekend to backup. With ZFS I can do hourly backups.

    • by mcrbids ( 148650 )

      Well, sort of....

      We switched from rsync to ZFS replication for our production environments and the difference in performance is rather extreme. (and why we made this change)

      Medium sized file system, 12 TB and a few hundred million files. Doing a backup with rsync took days, and it was all just tied up in IOPs, even if the number of files changed was rather small. At this scale, it takes more than 24 hours just to get a listing of files.

      Switching to ZFS with nighly snapshots and replication dropped backup t

    • The other advantage is that ZFS replication, unlike RSYNC, doesn't need to calculate diffs because ZFS it already keeps track of what blocks have changed since the last snapshot. This makes the entire process much faster less resource intensive.

      Imagine the following scenario:

      You are the sysadmin at a 24x7 company. You have a few hundred user's home directories (shared over NFS or SMB) on a fileserver that needs to be upgraded/replaced for some reason. You are tasked with migrating these home directories

  • by Anonymous Coward

    For those who already understand rsync and zfs the article adds nothing new that is of value. 1/3 of the article is telling you what rsync is, which you can fill with lorem ipsum and still not lowering the next-to-none quality of the article. We already fucking know what rsync is. It's in the man pages for, like 10+ years. And why do you need a Jedi picture just for that?

    Then the useless benchmark, taking another 1/3. No repeatable experiments. No statistics. Only one-shot timings. And the worst thi

  • by Maow ( 620678 ) on Tuesday December 22, 2015 @05:47AM (#51164159) Journal

    Jim Salter writes some great pieces on file systems for Ars Technica.

    At the linked article are Related Links. Of particular note is "Atomic Cows and Bit Rot" -- read that if you're interested in modern file systems.

    • Yeah, he writes okay pieces, but it kind of annoys me when he throws up blanket advice and then practically trips over himself extolling the opposite.

      ZFS: You should use mirror vdevs, not RAIDZ [jrs-s.net]

      Guess what? The entire rsync.net service is built on top of RAID-Z3, if I read their promotional portal correctly.

      One use case I can see for this is using ZFS to back up Postgres databases. I'm not the only person to think this might be a good idea. A while back, I listened to this talk, which I really enjoyed:

      Keith [youtube.com]

  • "The cloud" (Score:2, Informative)

    by ZorinLynx ( 31751 )

    Anyone else getting tired of is term? All it means is "someone else's computer". All you're doing is renting server space and replicating your data there. There's nothing special about it.

    • by Anonymous Coward

      Yep. 'The Cloud' is just shifting responsibility to someone else, who may or may not be doing a proper job of security or backups. This seems germane [textfiles.com].

    • Hmmm.... good point... perhaps we need "Smart cloud 2.0"

    • Anyone else getting tired of is term? All it means is "someone else's computer".

      To be fair, that's kind of what it has meant for years. I have a networking textbook that's 15 years old that represents unspecified parts of a network in a network diagram as a cloud shape. So "piece of computer network that I don't care much about the details", e.g. the Internet, has been called "cloud" for a while.

      Of course, this is not to be confused with "cloud computing", which has a more precise definition (basically distributed processing, but with on-demand virtual machines instead of physical n

    • Never heard of a private cloud then? We run a large virt cluster here & "the cloud" is the most straightforward & friendly way for me to refer to it to the higher ups. "Cloud" is just the same as "cluster", however the former is more widely recognised.

  • We've had a very significant discount for HN readers for years and we'd be happy to extend that to /. readers. Just email and ask.

    Really happy to be here - I am not sure why I am labeled as "new submitter" since I have been a slashdot user for ... 15 years ?

    Happy to answer any questions about our service here as well.

You are always doing something marginal when the boss drops by your desk.

Working...