EXT4 Is Coming 182
ah admin writes "A series of patches has been proposed in Linux kernel mailing list earlier by a team of engineers from Red Hat, ClusterFS, IBM and Bull to extend the Ext3 filesystem to add support for very large filesystems. After a long-winded discussion, the developers came forward with a plan to roll these changes into a new version — Ext4."
Yes but (Score:5, Interesting)
Interesting bit from wiki/ZFS:
Modularizable filesystem (Score:2, Interesting)
Re:Sounds like a good idea. (Score:2, Interesting)
Why EXT4 ? (Score:4, Interesting)
There are many factors that influence filesystems, not just "how fast it can write", but rather.. how it breaks when it does.
While the fanboys of XFS, JFS, ZFS may promise that their filesystems are faster, had no problems, secure and will not eat your data, it simply is not as proven as ext2 and ext3.
Scream fanboys scream, someone will listen, but the problem is that these filesystems are not proven in the field, or in some circumstances even in the kernel itself.
Re:Why only 48 bits? (Score:5, Interesting)
We'll need to adjust other things if filesystems ever get so huge. The whole design probably needs a rethink, but we can't do it now. We don't know what the future holds in terms of seek times, transfer rates, sector sizes, etc.
Comment removed (Score:5, Interesting)
Re:define very large (Score:2, Interesting)
For example, if we have 20-bit indexes (2^20 clusters max) and use 4-kilobyte clusters, to increase the maximum space we'll either have to add one bit to the indexes to double the maximum space or we'll have to increase the cluster size and have problems storing small files (remember the FAT16->FAT32 transition?)
ext4 is thousands larger than ext3, which will probably mean that indexes will need a lot more space, which will be bad for 8TB volumes (and besides, noone would notice any benefits!)
Re:Why EXT4 ? (Score:3, Interesting)
Note that servers with extensive mirroring and other hardware error-handling rarely need error-recovery from the filesystem. Filesystem errors happen on ordinary peoples harddrives when they grow old, and ext* have a million times more experience in the handling those than any enterprise FS..
What they really should do is (Score:1, Interesting)
The fact that nothing pressurises ever the distribution builders into using anything new has lead to majorly slowed down development of Linux.
Re:Sounds like a good idea. (Score:3, Interesting)
Re:Well, how does a Honda Civic ... (Score:1, Interesting)
Actually, except for his highly advanced algorithms, ZFS code is very small and simple, and on top of that, ZFS is really nice in small desktop deployments, where his "big iron" features give him the ability to detect and automatically correct garbage being delivered by that cheap SATA drive.
In fact, having been ported (compiles, doesn't yet run) to Linux and in process of being ported to OS X, and FreeBSD, ZFS is on a pretty good track to becoming ubiquitous... which would be the exact opposite of exotic.
Re:Sounds like a good idea. (Score:3, Interesting)
First off the bat: you can't install the bootloader in a XFS partition since XFS uses the first 512 byte block on the partition. Of course, most people install the bootloader in the MBR but for some it's an issue.
GRUB had a bug with XFS. When you tried to use a XFS partition as
For a considerable period of time, ext3's code was more stable than XFS.
ext3 has an ordered data mode (which is the default). Other journaled file systems only support writeback mode. In general, ordered data mode doesn't provide any better warranty of consistency than writeback mode but does make an important difference for a few special cases but which can make a substancial difference to a desktop user.
Typical annoying case:
- You're editing a file on your favorite text editor and you save it.
- The editor opens the file in overwrite mode, meaning the file is actually deleted and a new one is created (under Linux's default settings, the OS will commit the changes to the metadata in 5 seconds or less and the changes to the data in 30 seconds or less).
- The changes to the metadata are commited to disk.
- The system crashes!
When the system comes back up, the new file is there it's full of garbage.
With ext3's ordered data mode, the contents of the file would have been commited to disk before the associated changes to metadata. It's problable (but not assured!!) that after a crash you'll have either the old version or the new version of the file.
Re:Sounds like a good idea. (Score:3, Interesting)
I've just converted my main partition (non-/boot) on a notebook from XFS to reiser3 mainly because I work with huge svn working copies and svn loves to keep small files around, as well as create lots of small files (lock files, etc) during routine svn work. xfs is just way considerably slower than reiserfs for svn status, update, commit, cleanup. Besides, reiser3's tail feature means svn's penchant for small files uses less space overall on my tinny notebook harddrive. Not sure if performance of reiser3 will degrade over time, (I've been on xfs on this partition for longer than a year), but we'll see.
BTW, http://www.debian-administration.org/articles/388 [debian-adm...ration.org] My observations differ from theirs (operations on file tree). I do have a significant larger amount of files, and many of those are smaller than the default block size, so that might affect things.
On the server side, XFS, on multiple concurrent large, random, writes (postgresql) just creams reiser3 and ext3. (IIRC, battery backed SCSI raid controller, tested with both RAID1+0 and RAID5, Linux 2.6.x, 6 x 15000RPM 132(?)GB HDD) Read operations and single thread seq/random writes are too similar in performance for the various filesystems.
Another feature of XFS I used a lot (before converting to reiser3) is xfs_fsr, which defrags a mounted xfs filesystem. Oddly buggy though, as after some runs, some inodes tends to have max_extents corrupted (endian problem?). I'd recommend a xfs_repair after a xfs_fsr, which effectively makes xfs_fsr a utility for defragging *UN*mounted filesystems. So yeah, xfs is a tad unstable. I've only one real corruption, though, and that's from killing the notebook power during some writes. Not sure if that's from the fs, or the harddisk misbehaving.
A real O/S filesystem needs defrag! (Score:2, Interesting)
This is based not only on the need for a larger maximum file system, but a recognition that there is significant performance advantage to reducing read/write head movement and initiating large reads from consecutive blocks that can take advantage of the high transfer rates of today's drives. (this assumes that the OS filesystem doesn't attempt/require that the entire disk drive be cached in RAM to get decent performance)
Except for "write once" files, over time this will cause files to become physically spread over the disk and the performance benefit is reduced, unless a process periodically consolidates the blocks back into a contiguous series of blocks (ignoring for the moment that on today's disk drives, blocks may be "spared" into place that are not really physically consecutive, but just logically appear to be)...
One of the "proofs" that *nix is superior to other O/Ss has been the absence of a need to "Defrag" the file system.
A commenter on the article also raises the question of why the "right" solution isn't to increase the 2k block size limit rather than rework the internals of the block pointers, and got the response that since the linux kernal manages memory in 2k blocks, it is a nightmare in the kernal to support larger I/O transfers (although others here seem to indicate this is one of the solutions people have implemented)
Isn't "extents" a concept contained in NTFS? Has anyone looked into the patent implications of these proposed changes?
Re:fsck quality (Score:3, Interesting)
All of the major filesystems have a decent fsck, and all of them are by now stable to the point that you should worry about your hardware and backups failing, not your FS. The only qualifier on that is that ZFS is new, and I hope no one will view that as my FUDing.
Re:Sounds like a good idea. (Score:3, Interesting)