[an error occurred while processing this directive]
Slashdot Log In
SGI open-sourcing XFS
Posted by
Hemos
on Thu May 20, 1999 01:42 AM
from the it's-sorta-tomorrow dept.
from the it's-sorta-tomorrow dept.
Yun Ye writes "Finally, a journaling FS for Linux! Get the full story
. Excellent-we'll have to ask the SGI people about it tomorrow. And who came up with the name change. Whatever the case, this will help Linux continue to crack the high-end market.
[an error occurred while processing this directive]
This discussion has been archived.
No new comments can be posted.
SGI open-sourcing XFS
|
Log In/Create an Account
| Top
| 179 comments
(Spill at 50!) | Index Only
| Search Discussion
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
|
2
(1)
|
2
GPL? (Score:3)
If we need a journalled FS (I guess we do), then we need a GPLd journalled FS.
How's this going to be implemented? If it's a kernel patch, then correct me if I'm wrong, but doesn't it *have* to be GPL?
I guess if it's only a module, it can be any licence, right?
--
more than three (Score:5)
1. Poor large memory support. I'm not sure how the most recent kernels fare, but last I checked Linux only supports a maximum of 2GB of ram. This is probably one of the FIRST things which need to be fixed. I have heard that SGI is working on a patch to give 3.8GB on Intel machines... that sounds promising!
2. No raw I/O support. This is used for large RDBMS's for example. AFAIK Linus hates the idea, so it will probably never get this. Mind you this is minor because there are ways around it.
3. "fsync()" on large files is extremely inefficient. Again, this affects RDBMS to the extreme. It is so bad that in some cases Windows NT is 30 or 40 times faster than Linux on equivalent hardware doing database inserts and updates. To witness this for yourself, write a quick program that opens a file and loops appending a line and doing an fsynch on it. Notice the slowdown as the file grows. That shouldn't happen.
4. Poor performance under load. When the load on a Linux box goes up, context switch time increases a LOT. This is bad.
And these are just off the top of my head...
Mind you, I am not diminishing Linux, it is still my primary development platform of choice, however as it stands it doesn't make a good platform for really big single-computer tasks. I'm sure this will change in the future, but right now as much as I hate to say it, NT and commercial Unix have the lead. Also I'd love to hear corrections to my points above, they would be good news to tired eyes.
Thanks
SGI listens to users (Score:4)
Large files and the VFS (Score:3)
Actually there is a limitation in the VFS (virtual file system) layer that means right now no FS can have more than 2GB files.
It could well be that SGI have patches to address that, but that would be separate to the XFS code as such.
This restriction only applies to 32 bit platforms such as x86, non-Ultra Sparc and PowerPC of course. On Alpha, Ultrapenguin and Merced there is/will be no such limitation.
Choice of license (Score:3)
> source license, the company said, though it is
> sure to meet the requirements of the Open
> Source Definition, a spokesman said.
They need more than OSD compliance. If they want it to go into the Linux kernel, they need to use a GPL-compliant license. The main question is whether they want (or will accept) that the other proprietary Unixen can use it. If not, then the obvious choice is to GPL it. It will cause problems for the BSDs, but why should SGI care? They don't have the hype (momentum) of Linux.
If they do want their file system to become a standard, they could LGPL it, or even use an X-like license. That would also make it easier to backport changes to Irix. Using the LGPL would fmake it more problematic for competitors to keep their changes proprietary.
Re:what is the difference? (Score:3)
>
>To elaborate, imagine a long tape that represents your hard drive...
It hasn't been pointed out because it's not true. What you're talking about is a _log structured_ file system. That's a whole different thing. A _journaling_ file system looks after the metadata using a (duh) journal that records changes to directories, attributes, allocation maps etc. so they can be either rolled forward or rolled back upon reboot. However, a JFS (not necessarily IBM's JFS, though they were pioneers in this area) has pretty much the same ways of handling the actual file data as any other FS, including deferred writes etc.
Any FS including a JFS or LSFS may support features for increased synchrony providing greater data integrity, such as fsync() or O_SYNC, but that's really a separate matter.
grio (Score:3)
They are only releasing the journaling part,
and it's limited to 64-bit.
This is the perfect move. They give Linux something great, get extremely good PR, establish XFS as an industry-standard, and still manage to
keep a proprietary advantage
to make you want to buy their machines for technical reasons. Really smart.
Even such a `limited' version will
be better than NTFS.
SFS - Port to Linux ? (Score:3)
It's great to never have to use l:disk-validator (amiga fsck) again ( o.k, ok. it's in ROM, not l: on all Amigas above 1.3, but hey...)
The website has an exceptionally clear discription of how the filesystem has been implemented. It's 64-bit, using the NSD (New Style Device)API.
It's also free.
here's the site :
www.xs4all.nl/~hjohn/SFS/ [xs4all.nl]
In the interest of
the feature list is duplicated here:
(Note that some of the things described are amiga-specific. The dos.library limitation, in particular, is irrelevant to linux, and probably to future amigas, too. The 2GB max single file size limitation arises from the amiga's incomplete transition to 64-bit APIs - CBM went bust as the NSD spec was released, and, once again, may not be relevant to a linux implementation)
This page gives you an overview of what SFS is capable of. It will also give you an idea what features we are planning to add in the near future in planned features, and what features we are considering later on.
Features
Below you'll find a list of features which are already implemented in SFS.
Fast reading of directories.
Fast seeking, even in extremely large files.
Blocksizes of 512 bytes up to 32768 bytes (32 kB) are supported.
Supports large partitions. The limit is about 2000 GB, but it can be more depending on the blocksize.
Support for partitions larger than 4 GB or partitions located (partially) beyond the 4 GB barrier on your drive. There is support for New Style Devices (NSD) which support 64 bit access, the 64-bit trackdisk commands and SCSI direct.
The length of file and directory names is internally limited only by blocksize. Limitations in the dos.library however will reduce the effective length of file and directory names to about 100 characters.
The size of a file in bytes is limited to slightly less than 4 GB. Because of limitations in dos.library we will however probably not allow files larger than 2 GB, to avoid potential problems.
Modifying data on your disk is very safe. Even if your system is resetted, has crashed or experienced a powerloss than your disk will not be corrupted and will not require long validation procedures before you will be able to use it again. In the worst case you will only lose the last few modifications made to the disk. See Safe writing for detailed information on how this works.
To be able to ensure that your disk never gets corrupted we use an internal caching system which keeps track of modifications before writing them to disk. This cache has the additional benefit that creating and copying files can be a lot faster, especially if the drive used isn't very fast (ZIP & floppy drives for example).
There is a built-in low-level read-ahead cache system which tries to speed up small disk accesses. This cache has as a primary purpose to speed up directory reading but also works very well to speed up files read by applications which use small buffers.
Disk space is used very efficiently. See the Space efficiency page for a comparison between a few filesystems.
Supports notification and Examine All.
Supports Soft links (hard links are not supported for now).
Using the SFSformat command you can format your SFS partition with case sensitive or case insensitive file and directory names. Default is case insensitive (like FFS).
There is a special directory which contains the last few files which were deleted. See deldir.
Planned features
The list of planned features below are features which are either already in development or are very likely to be added to the filesystem in the near future.
Multiuser support.
Built-in background file and free space defragmenter. Already the filesystem is set up in such a way to allow for easy implementation of this feature without having to do extensive scanning of the disk before the defragmenter can begin. This means defragmenting can be done in the background and can be interrupted at any time (even by a reset, crash or power failure) without loss of data.
Mirroring of important filesystem administration blocks to make the filesystem more robust.
Features we are considering
The features below are either features which are very application specific or not used very often. If there is enough demand for some of these features we will consider implementing them in the filesystem.
Mirroring of complete partitions. Such a feature would not only ensure that all your data is very safe since everything is stored twice on two different drives, but it will also speed up multiple concurrent read accesses since both drives can be used to deliver data. This feature however normally is only used on mission critical systems (like file servers) and would be of little use on systems not equipped with high speed SCSI controllers.
Support for striping. To put it simply, striping can be used to distribute data to multiple drives which increases the total available bandwidth as each disk will be used simultaneously to access part of the data. If you for example have 2 drives than with striping all odd blocks of 64 kB would be stored on drive 1, and all even blocks of 64 kB would be stored on drive 2. A similair scheme is used with more than 2 drives. With striping there is also an option to use one of the drives as a parity drive. If one of the drives crashes or becomes unuseable than the data on that drive can be reconstructed using the remaining drives which ensures that your data is very safe. However, although it may seem that striping could speed up disk accesses by a factor of 2 or more, this is usually only the case when working with very large video streams or multi user systems. Under normal conditions you will be hard pressed to find any speed gains at all.
Support for hard links (soft links are already implemented).
The ability to extend a partition without having to copy all your data and format the partition.
New DOS packets. There are lots of ways to exploit the ability of a filesystem better than is possible at the moment. New packets are the key to this. For example, support for paths larger than 255 characters, live directories (directories which are updated in realtime), enforcing recordlocking and many more. This however must be a team effort and we'll need support from writers of important applications and people willing to build new interfaces to access these new abilities.
Re:what is the difference? (Score:4)
There have been a few good answers to this question in this thread, but there's one thing that hasn't been pointed out yet: journaling file systems don't (immediately) overwrite a file when it is changed.
To elaborate, imagine a long tape that represents your hard drive. The tape is written from left to right. When a file is changed, the new version is appended to the end of the existing data, while the old version remains "untouched" farther to the left. When the kernel has finished updating all pending file writes, it can write a "checkpoint" at the end of the existing data. Essentially the checkpoint says "everything up to the point is kosher." If the disk gets really full, then the filesystem can double-back and overwrite the really old data at the beginning of the tape.
Now, let's see how this works to help recover from crashes; say the computer crashes as it's writing out a file to disk. In a conventional filesystem, a lot of things could go wrong: it could have been overwriting the old data, but finished only half of the job. Then, at best, you've got a corrupted file with a hybrid of new data and old data. The file allocation table may not have been updated, so the file may be completely lost. It's a bad situation.
Conversely, if the crash happened with a JFS, the computer would run an "fsck" and look for the last checkpoint. It's guaranteed that all data preceding the checkpoint has integrity. Then the filesystem would just work from that checkpoint and ignore any non-checkpointed data. This can still lead to some data loss, but never to filesystem corruption.
Of course, this is a simplified account, and there are implementation details. But that's the gist of it.
Good for *BSD too! (Score:3)
I'd love to be able to run XFS on our apartment FreeBSD fileserver/firewall, as well as my Linux desktops. I am really looking forward to playing with this! Thank you SGI!
Re:what is the difference? (Score:3)
Scott
C{E,F,O,T}O
sboss dot net
email: scott@sboss.net
GPL code in BSD? Yes (Score:3)
Excerpt from http://www.OpenBSD.ORG/policy.html [openbsd.org]
The GNU Public License and licenses modeled on it impose the restriction that source code must be distributed or made available for all works that are derivatives of the GNU copyrighted code.
While this may be a noble strategy in terms of software sharing, it is a condition that is typically unacceptable for commercial use of software. As a consequence, software bound by the GPL terms can not be included in the kernel or "runtime" of OpenBSD, though software subject to GPL terms may be included as development tools or as part of the system that are "optional" as long as such use does not result in OpenBSD as a whole becoming subject to the GPL terms.
As an example, some ports include GNU Floating Point Emulation - this is optional and the system can be built without it or with an alternative emulation package. Another example is the use of GCC and other GNU tools in the OpenBSD tool chain - it is quite possible to distribute a system for many applications without a tool chain, or the distributor can choose to include a tool chain as an optional bundle which conforms to the GPL terms.
So a GPL part only have to be optional, XFS qualify.
Splendid! (Score:5)
Their white paper [sgi.com] on XFS explains how XFS is different from conventional file systems and what they did to it to make it fast with very large files as well as with many, many small files (SGI is not Open Sourcing their GRIO capabilities, which together with RT scheduling would make Linux a serious multimedia contender).
If you are a USENIX member, you will be able to download the Sweeney paper Scalabilit y in the XFS File System [usenix.org] from the USENIX server. It was published in the Spring 1996 proceedings of the USENIX, so you may also read it in your Universities library.
what is the difference? (Score:3)
8Complex
PS - Slashdot could use a full dictionary of terms from around the site... That'd rule...
Re:what is the difference? (Score:3)
I think a journaling filesystem is one where it keeps a journal of what, when, and where it is writing data, between the time it schedules it to be written and the time it is actually written. That way, if a system crashes, the filesystem can look at the journal and continue where it left off, not losing any data. A non-journaling filesystem schedules stuff to be written, then acts as if the write has happened. This works ok, until the system crashes. If a write was pending when the system crashed, that data is lost.