tytso - Slashdot User

Comment Most of the early stories on the web are wrong.... (Score 5, Informative) 249

by tytso on Wednesday October 24, 2012 @09:42PM (#41760179) Attached to: EXT4 Data Corruption Bug Hits Linux Kernel

Comment Re:Has Ted Cooked the Benchmarks Again? (Score 2, Informative) 348

by tytso on Sunday January 17, 2010 @10:18PM (#30803638) Attached to: Google Switching To EXT4 Filesystem

Comment Re:Time for a backup? (Score 1) 348

by tytso on Sunday January 17, 2010 @11:40AM (#30798672) Attached to: Google Switching To EXT4 Filesystem

Comment Re:Has Ted Cooked the Benchmarks Again? (Score 2, Informative) 348

by tytso on Sunday January 17, 2010 @10:53AM (#30798274) Attached to: Google Switching To EXT4 Filesystem

Comment Re:Google doesn't need journaling? (Score 1) 348

by tytso on Friday January 15, 2010 @12:31PM (#30779962) Attached to: Google Switching To EXT4 Filesystem

Comment Re:Google doesn't need journaling? (Score 1) 348

by tytso on Friday January 15, 2010 @11:20AM (#30779172) Attached to: Google Switching To EXT4 Filesystem

Comment Re:Ubuntu 9.10? (Score 3, Informative) 348

by tytso on Friday January 15, 2010 @02:03AM (#30775742) Attached to: Google Switching To EXT4 Filesystem

Comment Re:Google doesn't need journaling? (Score 2, Interesting) 348

by tytso on Friday January 15, 2010 @01:52AM (#30775694) Attached to: Google Switching To EXT4 Filesystem

Comment Re:Google doesn't need journaling? (Score 2, Informative) 348

by tytso on Friday January 15, 2010 @01:38AM (#30775636) Attached to: Google Switching To EXT4 Filesystem

What Soft Updates apparently does is assume that once the data is sent to the disk, it is safely on the disk. But that's not a true assumption!

Journaling, and every other filesystem, has exactly the same problem. If consistence is required, YOU MUST DISABLE THE CACHE, unless it is battery-backed, or you are willing to depend on your UPS. This is the penalty we take for devices which lie to the OS about flush operations and the like.

Yes, there were, in the bad old days, devices which lied when the OS sent a flush cache command, and in order to get a better Winbench score, they would cheat and not actually flush the cache. But that hasn't been true for quite a while, even for commodity desktop/laptop drives. It's quite easy to test; you just time how many single block sector writes followed by a cache flush commands you can send per second. In practice, it won't be more than, oh, 50-60 write barriers per second. In general, if you use a reputable disk drive, it supports real cache flush commands. My personal favorites are Seagate momentus drives for laptops, and I can testify to the fact that they all handle cache flush commands correctly; I have quite a collection and it's really not hard to test.

The big difference between journalling and soft updates is we can batch potentially hundreds of metadata updates into a single journal transaction, and send down a single write barrier every few seconds. The journal commit is an all-or-nothing sort of thing, but that gives us reliability _and_ performance.

The problem with soft updates is that the relative ordering of nearly most (if not all) metadata writes are important. And putting a write barrier between each barrier operation is Slow And Painful. Yes, you can disable the write cache, but then you give up a huge amount of performance as a result. With journaling we can get the performance benefits of writes, but we only have to pay the cost of enforcing write ordering through the barrier once every few seconds.

Of course, there are workloads where soft updates plus a disabled write cache might be superior. If you have a very metadata-intensive workload that also happens to call fsync() between nearly every metadata operation, then it would probably do better than a physical block journalling solution that used barrier writes but run with an enabled write cache. But in the general case, if you compare a more normal workload where fsync()'s aren't happening _that_ often, and compare physical block journalling with a write cache and barrier ops, with a Soft Updates approach with the write cache disabled, I'm pretty sure the physical block journalling approach will end up benchmarking better.

Comment Re:Time for a backup? (Score 2, Informative) 348

by tytso on Friday January 15, 2010 @01:18AM (#30775536) Attached to: Google Switching To EXT4 Filesystem

Comment Re:Has Ted Cooked the Benchmarks Again? (Score 5, Informative) 348

by tytso on Thursday January 14, 2010 @08:11PM (#30773226) Attached to: Google Switching To EXT4 Filesystem

Comment Re:Google doesn't need journaling? (Score 4, Interesting) 348

by tytso on Thursday January 14, 2010 @07:55PM (#30773056) Attached to: Google Switching To EXT4 Filesystem

So there's a major problem with Soft Updates, which is that you can't be sure that data has hit the disk platter and is on stable store unless you issue a barrier operation, which is very slow. What Soft Updates apparently does is assume that once the data is sent to the disk, it is safely on the disk. But that's not a true assumption! The disk drive, especially modern ones with large caches, can reorder writes which are sent to the disk, sometimes (with the right pathological workloads) for minutes at a time. You won't notice this problem if you just crash the kernel, or even if you hit the reset button. But if you pull the plug or otherwise cause the system to drop power, data in the disk's write cache won't necessarily be written to disk. The problem that we saw with journal checksums and ext4 only showed up on a power drop, because there was a missing barrier operation, so this is not a hypothetical consideration.

In addition, if you have a very heavy write workload, the Soft Updates code will need to burn a fairly large amount of memory tracking the dependencies and burn quite a bit of CPU figuring out which dependencies need to be rolled back. I'm a bit suspicious of how well they perform and how much CPU they steal from applications --- which granted, may not show up in benchmarks which are disk bound. But if the applications or the large number of jobs running on a shared machine are trying to use lots of CPU as well as disk bandwidth, this could very much be an issue.

BTW, while I was doing some quick research for this reply. it seems that NetBSD is about to drop Soft Updates in favor of a physical block journaling technology (WAPBL), according to Wikipedia. They didn't get a reference to this, nor did they say why NetBSD was planning on dropping Soft Updates, but there is a description of the replacement technology here: http://www.wasabisystems.com/technology/wjfs. But if Soft Updates is so great, why is NetBSD replacing it and why did Free BSD add file system journaling alternative to UFS?

Comment Most of the early stories on the web are wrong.... (Score 5, Informative) 249

Comment Re:Has Ted Cooked the Benchmarks Again? (Score 2, Informative) 348

Comment Re:Time for a backup? (Score 1) 348

Comment Re:Has Ted Cooked the Benchmarks Again? (Score 2, Informative) 348

Comment Re:Google doesn't need journaling? (Score 1) 348

Comment Re:Google doesn't need journaling? (Score 1) 348

Comment Re:Ubuntu 9.10? (Score 3, Informative) 348

Comment Re:Google doesn't need journaling? (Score 2, Interesting) 348

Comment Re:Google doesn't need journaling? (Score 2, Informative) 348

Comment Re:Time for a backup? (Score 2, Informative) 348

Comment Re:Has Ted Cooked the Benchmarks Again? (Score 5, Informative) 348

Comment Re:Google doesn't need journaling? (Score 4, Interesting) 348

Comment Re:Sounds like they need to talk to Kirk McKusick (Score 1) 421

Comment Workaround patches already in Fedora and Ubuntu (Score 4, Informative) 421

Comment Re:Get an enterprise drive (SLC, not MLC) (Score 1) 480

Slashdot Top Deals

Slashdot