Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Linux Software

Advanced Filesystem Implementors Guide Continues 60

Tom writes: "This is part six of the Advanced filesystem implementor's guide. I've been following an outstanding series of articles about implementing the advanced filesystems that are available with Linux 2.4. The author really knows his stuff and has done a great job with explaining Reiserfs, XFS, GFS, and the other file systems that are available." The series gets into greater depth as it goes on; you may want to start with Part One and work on from there.
This discussion has been archived. No new comments can be posted.

Advanced Filesystem Implementors Guide Continues

Comments Filter:
  • Its gotten to the point in Linux that there are too many file systems, how many jornalizing file systems are necessary. Actualy I think one is enough, there comes a time when its best for all to take the best features of the herd and start integrating them.
    • What if features that have different advantages to different people are mutually exclusive?

    • The idea with all the file systems is so that you can migrate to linux. Also diffrent file systems have their features. Such as riserfs is used on lots of news servers because its good with lots of small files. Supporting all these file systems is very good incase you have to resurect a old half dead SCO, or SGI box.
    • Not completely true. One file system might make it to be the 'default instalation choice' in most distributions, but each of the 3 journaled FS's has there own set of features and targeted markets.

      ReiserFS, is a top-tech journaling file system which can be _very_ fast with some situations (large directories, etc), but as hans reiser pointed out, his purpouse is not to make a stable FS, but to keep development up, inventing new and cool technikes.. so not your #1 production choice for some.

      XFS is known for its high output and parralism. In its roots it was tuned for streaming video and audio, and to work wel with _many_ cpu's (think >> 32).

      JFS has a bit more mainframe background, stable (slower?), and secure..

      ofcource each day they grow a little closer together (each wants all advantages), but untill one of them reaches the status 'ultimate FS', i think there is plenty of room for multiple visions and implimentations.
    • Dude, why can't Linux people get their paradigms straight?

      There is nothing wrong with lots of different filesystems. They all use the same API, so one can use whichever is best suited for one's task. XFS sacrifices metadate performance for awesome large file performance. ReiserFS sacrifices large file performance for small file performance. ext2 sacrifices saftey for update performance. Yet, you can use whichever suites you best, because, thanks to the VFS, they all look the same to user programs.

      PS> I never understood why people complain about this, but not about the fact that there are so many toolkits, which unlike filesystems, have different APIs (and thus incompatible application bases). Maybe its time for a VTK (virtual toolkit) layer in X?
    • Why does it seem like all the responses to this are full of hot air ?

      Some journalling filesystems exist because there are UNIX companies with expertise in them that support them, like XFS and JFS.

      Some journalling filesystems are a natural migration for most linux users - like ext3.

      And some people want to re-invent filesystems en todo like Hans Reiser, and a good journalled filesystem is just the first stop.

      More than one is just "value added". They all work. They are all secure and stable. Some are faster than others - but XFS, ReiserFS and ext3 are all "fast enough" for almost any uses.

      The parent echos a common complaint about Free Software - that developer resources are not dedicated appropriately. Well, developers work on what they want, or what they are paid to work on. This often leads to multiple efforts that accomplish similar goals - like window managers, desktop environments, word processors, journalled filesystems, VM management etc. But ultimately competition is good if intelligent test results are publicized.

      Look at the Mindcraft web server benchmark results about 18 months ago. Now, linux blows the doors off IIS in the exact same test. The same is becoming true of filesystems. Test results show ext2/3 is slow with lots of small files - so a developer named Daniel Phillips added a directory hash that fixes this shortcoming.
  • NTFS ? (Score:2, Interesting)

    by bnatale ( 532324 )
    I wonder when someone will finally remove the "DANGEROUS" tag from the NTFS write option and stabilize this thing...
    • NTFS will never be stable in linux because microsoft keeps changing ntfs with their ntfs using operating systems. Last I heard it was that they changed the journaling structure.

      • Then why does a FS from NT4 still work with Windows XP?
        • Microsoft allways supports their old crappy technologies....... in their newer, bigger, slower technologies.
        • Because WinNT used NTFSv4, Win2k used NTFSv5 and WinXP uses NTFSv6
    • Implementing a working driver for a filesystem that tends not to follow even the sparse ducumentation that is available is *hard*.

      On top of the problems of having to basically reverse engineer the fs you get the joys of a team of Lawers from MS just waiting for you to do something they can sue you for.

      So basically you need somone who has a lot of time oh their hands with a partition they don't mind frying on a constant basis and who isn't worried about potential lawsuits.

      Good luck.

    • Without detailed specifications, available at a huge price of course, writing to the disk will always be "Dangerous"
  • and when will it be possible to encrypt XFS, reiserfs, ext3... on-the-fly? I really need that if the CIA seizes my hard-disk.
    • You might want to check out the XFS mailing list back around July 5. It appears that you can do this with pretty much any filesystem, only you do it through loopback mode (essentially the fs is not encrypted but the block device is). It does involve some ugly kernel patches that probably won't be getting near a stable kernel anywhere in the near future; but you could do it.
      • There's TCFS [www.tcfs.it] too (Transparent Cryptographic File System). It doesn't need to use a loopback device, but on the other hand it doesn't hide all information (e.g. filenames aren't altered).

        USA mirror of TCFS site is here [jhu.edu], but looks a bit out of date.
    • look on linuxdoc.org, if memory serves, there is a howto...
    • Reiser4, which says to be released Sep 30, 2002 will support plugins to the file system for things like ACL's, compression, and yes encryption [namesys.com]. Looks like a great concept.
    • by Adam J. Richter ( 17693 ) on Saturday October 27, 2001 @03:59PM (#2487853)

      I am posting this from a notebook computer that has all partitions encrypted except for a boot partition at the front of the disk. The kernel boots an initial ramdisk with an /sbin/init script that does essentially the following, using cryptoapi [sourceforge.net], the successor to the linux "kerneli" patches.

      modprobe cryptoapi
      modprobe cryptoloop
      modprobe cipher-aes
      losetup -e AES /dev/discs/disc0/part6 /dev/loop/0
      Password:
      mount -t ext2 /dev/loop/0 /newroot
      cd /newroot
      exec ./bin/chroot . ./sbin/init $@

      This should work with any disk file system, not just ext2.

      I have been using this arrangement for several months now on a couple of computers, the slowest of which is a Kapok 1100M that uses a 233MHz Pentium II process and, I believe, PC-66 SDRAM. On that computer, the change in interactive responsiveness is hard to notice, but it is noticible for disk intensive activities. I have not timed it, but I think that big rsync runs are at least a factor of two slower.

      I do not run swapping on these computers, as I've seen claims that there are more potential deadlocks when attempting to swap to an encrypted partition than when attempting to swap to an unencrypted partition.

      I hope this information is helpful.

  • by Peaker ( 72084 ) <gnupeaker @ y a h oo.com> on Saturday October 27, 2001 @03:16PM (#2487757) Homepage
    It seems to me, the more I think about it, that file systems should be buried in the past, as the idea of mapping a hierarchy of string identifiers to serialized objects is not quite the way to do it.

    Firstly, a much better user interface to objects would be a relational database the user can query anything on.

    As for a system interface to objects, why force the objects to be serialized? Use orthogonal persistency. This method is more efficient, and easier for the applications. It actually makes persistency transparent, except for critical applications, that need to persist something now in which case, they can use a journalling interface.

    In summary:
    - Replace file system persistency with orthogonal persistency.
    - Replace the hierarchic-string uesr interface with a relational database.
    • by Anonymous Coward
      It would appear that Hans Reiser agrees with you, at least in part. Check out Future Vision [namesys.com] on the Namesys [namesys.com] site. As the front page says, "the interesting stuff is still in the future." The killer file system they have now is just foundation work.
    • Ok, I understand the part about using a relational database instead of mapping a string hierarchy to identifiers. I agree with that part.

      But could you explain what you mean when you say objects in a filesystem are forced to be serialized? And what orthogonal persistency is. It sure sounds good, but I would really like to know what it means.
      • by Peaker ( 72084 ) <gnupeaker @ y a h oo.com> on Saturday October 27, 2001 @09:17PM (#2488471) Homepage
        Persistency: Data's "survival" throughout time, power breaks, etc. Persistent memory is non-volatile memory (disk, for example).
        Persistency in operating system is usually achieved by writing things to disk, in order to persist them.

        ... could you explain what you mean when you say objects in a filesystem are forced to be serialized?

        Not all data in a file system can be stored as it is in memory, because pointers, and other information must be converted to persistent form. Often objects are stored in very difficult ways to write to disk (by being spread on many small linked objects, for example). This means you must serialize the data into the disk, by converting it to a stream of 1's and 0's, that allows reconstructing the objects' structure. This requires a lot of work for every application and object implementor, as they have to create methods to serialize, and de-serialize the objects, from their normal repserentation to a persistent streamed representation.

        And what orthogonal persistency is. It sure sounds good, but I would really like to know what it means.

        Orthogonal persistency is persistency implemented by the underlying operating system, rather than every application writer.
        The entire system state is saved to disk every once in a while, in a checkpoint.
        Mechanisms are used to ensure there's always a stable/reliable checkpoint to go back to. Some schemes even let you roll back to any checkpoint in the past. Typically, checkpoints are done every 5 minutes.

        Orthogonal persistency is totally transparent to applications. They seem to 'live forever', and do not need to explicitly persist or serialize their information. They can keep it represented as objects, or whatever representation they choose for their own simplicity.

        Orthogonal persistency treats RAM as a cache to the disk, and thus achieves two purposes.

        Simplicity: There is only non-volatile memory, rather than volatile, and non-volatile memory, that are allocated and managed separately

        Performance: It is much easier to optimize this system, as there are no file caches, and memory swap areas on disk. Instead, you treat the entire RAM as a cache to the disk, allowing simpler and more powerful page caching algorithms, that do not have to guarantee things such as quick disk writes for files, as file systems do.

        An amazing advantage for orthogonally persistent systems, is that due to the entire chunk of dirty pages from memory being copied to disk at once, it can sequentially move the disk heads across the disk to update all necessary areas. This process is called migration, and is a far more efficient method of updating the disk from the volatile state, than the explicit update used by current file systems.

        Yet another advantage, is that due to the entire system state being preserved as a whole, more powerful security schemes can be used. The whole load-from-file process can be avoided, and with it, the security problems of identifying who has access to what file, and why.

      • "Persistence" is what a filesystem provides, and RAM does not -- the object can remain in existance indefinitely.

        "Serialization" means you take your object and turn it into a stream of bytes of some sort. Some more introspective languages, like Python, Smalltalk, and Java allow very easy serialization, but in something like C you spend a lot of time figuring out how to do it. Even if it is indirect, most files somehow represent an object that was in memory and can be put back into memory at a later time.

        "Orthogonal" means that something is seperate from something else -- or more specifically, that while two aspects of a thing are related, you can work with one without effecting the other. Kind of -- it's a subtle (though very useful) notion.

        "Orthogonal Peristence" means that all objects persist indefinitely with no effort from the programmer. "Orthogonal" refers to the fact that the persistence happens without any relation to other aspects of the program -- everything just persists by default. While it may involve serialization, this is hidden from the programmer, as is any other technique that supplies the persistance.

        In such a system there wouldn't be any distinction between objects in RAM or on a disk -- often that is then expanded to objects that are also remote (similar to CORBA, but again, the network access is orthogonal and invisible). Anyway, the system moves things to disk as it needs to, and pulls them off as needed.

        I brought up the cleanliness issue before, but the other issue is scaling. Particularly something like garbage collection is a bit difficult, because you can't just do a mark-and-sweep every so often, because anything on the entire disk could contain a reference.

        EROS [eros-os.org] has this, Smalltalks have generally had this (you might wish to look at Squeak [squeak.org]), and the old Lisp machines also tended to have orthogonal persistence.

    • I dunno about the database idea. Is the filesystem going to be replaced by one specific set of tables with conventional fields? I can see how that would be done, but I honestly can't see why. You just got another heap of data with a different set of metadata.

      The other option is a database with dynamic tables, that would somehow fit the data. I don't know how you are going to manage that, though... can any application make tables that make sense for its problem space? How are those tables partitioned off so that you have some degree of safety, that one application doesn't step on another? How are they then integrated, so information from one application can be used in another?

      A non-relational database might make more sense, I believe they are often called Object Databases (not to be confused with an OO RDBMS). That's really just a way of saying "orthogonal persistence", except maybe that they aren't completely orthogonal (they require some extra programming to use).

      The problem with orthogonal persistence, that I see, is all the junk that can collect. Having used Squeak [squeak.org], which offers a certain sort of persistence in its images, transient objects can pile up fairly easily and lead to a sort of faux-memory-leak in the system. It's a convenient system, but not stable.

      Serialization provides a certain discipline -- it's like you have a checkpoint in the application when everything gets consolodated into something well-defined and granular.

      Now, you don't have to serialize to apply this sort of discipline. But orthogonal persistence just makes it so damn easy to be undisciplined. I feel like there's some major work to be done to find a way to manage such a large collection of interrelated objects with indefinite lifespans.

      • The problem with orthogonal persistence, that I see, is all the junk that can collect. Having used Squeak [squeak.org], which offers a certain sort of persistence in its images, transient objects can pile up fairly easily and lead to a sort of faux-memory-leak in the system. It's a convenient system, but not stable.

        Hmm, does Squeak lack garbage collection or something? One would imagine that a persisted object would be eligible for collection once there were no more references to it from wherever your persistent object graph is rooted. A persistence system without any roots can even work provided you have a lot of space to store objects that are out of scope and periodically compact it by selecting roots and discarding everything that isn't referenced by them -- basically a copying collector, which you can get away with when you're swapping and have good reference locality.

        Managing lots of objects requires a lot of discipline and work, but the existing body of theory is perfectly fine for managing billions of objects. It's just finding the right application of it that's tricky.
        • By faux-memory-leaks, I mean the memory leaks you get even when you have garbage collection. Typically dictionaries are the most egregious data structure -- it's easy for objects to be left in a dictionary after their usefulness has passed. Also, any named object can't be garbage collected. Caching can also lead to such memory leaks. It's a different style of problem from in C, but it definitely still exists.

          Also, garbage collection on a few gigs of interrelated data isn't easy. Single-pass definitely won't work, but there are a lot of good incremental garbage collection algorithms.

          Current theory can mostly deal with a large number of persistant objects -- though in many areas it's just theory. Current practice definitely can't deal with this sort of persistance.

          • The implications of what you are saying, is that all systems must be rebooted every once in a while, because they will leak too heavily.

            As far as I know, Linux can have uptimes of years, without any memory leaks. This is to show that memory does not necessarily leak throughout time, and an 'infinite uptime' achieved with orthogonal persistence should be possible, as Linux achieves it.

            An orthogonally persistent system should actually gain simplicity in many aspects, probably resulting in more stability, too.
            • The linux kernel may be up for a year or more, but seldom will the applications also be up for that long. With orthogonal persistence, applications may never be "rebooted" -- i.e., started and stopped.

              By very careful development, the kernel has been made very stable. However, applications are far from that stable. So when I say that practice needs to be improved, I mean that the correctness of the kernel has to be extended to the system as a whole. Or some other technique of partitioning has to be created, because the partitioning we use in Unix (processes) is part of what orthogonal persistence seeks to eliminate.

              • Firstly, why do you use assume processes cannot be restarted if necessary, in an orthogonally persistent system?

                Secondly, I was not aware long-uptime Linuxes required restarting their processes due to leaks over time.
                If this is the case, process restart support may truly be necessary in an orthogonal persistent system, but its still not relevant to the original dilemma of explicit versus orthogonal persistency.

It is easier to write an incorrect program than understand a correct one.

Working...