A Good Filesystem for Storing Large Binaries? 214
jZnat asks: "I own hundreds of gigabytes of binary data, usually backed up from other mediums such as CDs and DVDs. However, I cannot figure out which filesystem would be best for storing all this reliably. What I'm looking for is a WORM-optimized FS that also has good journaling methods to prevent data loss due to some natural disaster while data is being shifted around. Trying something new for once, I tried using SGI's XFS due to its promising details, but I was met with countless IO errors after trying to write large amounts of data to it. I feel that Ext3 is not optimal for this; ReiserFS is too slow when it comes to reading large data files; and Reiser4 isn't mature enough to entrust my digital assets to. What filesystem would be most appropriate for these needs?"
JFS (Score:4, Informative)
Re:JFS (Score:2)
I did see a Red Hat bug report a while back about very large file write performance issues on ext3: https://bugzilla.redhat.com/bugzilla/show_bug.cgi? id=156437 [redhat.com]. I don't know if the fix is in the official kernel yet, or if it was just a RHEL specific bug
Re:JFS (Score:2)
How big are the files? Hundred of gigabytes? terrabytes? Most non-commercial filesystems are very inefficient when it comes to handling files in the hundreds of gigabyte size.
Efficient handling of such large files requires that you also have disk hardware and I/O subsystems to handle it. If you have a large number of small files, the I have no idea, I only deal with VLF (Very Large Files, starting at 100GB+)
Re:JFS (Score:2)
What type of data requires a file 100Gb in size?
Re:JFS (Score:3, Informative)
Drilling data and real-time sensor data from oil-wells in the North-Sea is one example. Observational data from particle accelerators, weather data.... many areas have HUGE data storage requirements.
Re:JFS (Score:3, Informative)
Thermouclear explosion simulations, for one.
On a more prosaic level, we've got databases that have hundreds of millions of rows, regularly growing into the billion+ size. Yes, it's partitioned, but the partitions are still huge.
Re:JFS (Score:2, Informative)
Re:JFS (Score:3, Interesting)
Re:JFS (Score:2)
AFAIK, JFS doesn't run on IBM mainframe operating systems. The "unix like" hierarchal filesystems available on the mainframe are HFS and zFS.
Perhaps you used JFS on a big AIX cluster?
Re:JFS (Score:2)
Re:JFS (Score:2)
Obviously this is anecdotal evidence, but I'm not making anything up.
Re:JFS (Score:2)
Taking FS snapshots of an open, active database is a recipe for disaster, IMNSHO.
There are uncommitted transactions, buffered data, etc, which make the on-disk "stuff" transactionally inconsistent. DBMS vendors write backup utilities for a reason.
As a DBA of very large systems (TB-sized, with billion-row tables), I wo
The Google Filesystem (Score:5, Funny)
Re:The Google Filesystem (Score:2)
Re:The Google Filesystem (Score:5, Insightful)
I doubt that. To run GFS (assuming you have the code), you need to have a big honking cluster, to replicate data across machines. Also, it assumes a different file semantics, so you need to hand-code your apps to use the different reading and writing semantics. It only works well for appending writes and streaming reads. Furthermore, GFS does not have file-locking, and concurrent writes will leave your files in an undefined states.
Re:The Google Filesystem (Score:2)
Re:The Google Filesystem (Score:2)
Re:The Google Filesystem (Score:2)
Re:The Google Filesystem (Score:2)
Try JFS? (Score:2, Informative)
Re:Try JFS? (Score:2)
Re:Try JFS? (Score:2)
Funny, b/c I too have a 4 drive array . . . 4 to make 1, the true paranoid array.
-nB
Re:Try JFS? (Score:2)
I think they have a model up to 16 drives, but you should look at the 9550SX-8LP, 8 drive ATA controller.
For SATA : model 8506-8
Up to 8 Serial ATA
0,1,10,5, JBOD
Departmental Servers, Security and Surveillance, Disk-to-Disk Backup...Raid 10 on two arrays looks good to me....
For max size, I think would opt for a RAID5 on 7 disks and keep the 8th drive as a "hot-remplacement" for the soon to be faulty drive...
7*5
Re:Try JFS? (Score:2)
Damn, that guy just got me thinking at my next fileserver 8)
Filesystem choice... (Score:4, Insightful)
You need a filesystem that can be "burned" to a medium, yet have error correction capability.
Journaling doesn't do this. Journaling is for when you get a power surge in the middle of a write, you can get some of the data back. Currently no regular FS can do that.
Why is this modded troll | Re:Filesystem choice... (Score:2, Insightful)
You sure? (Score:2)
Actually, I wouldn't use ANY filesystem for this sort of work. The files won't change in size and I doubt they'll be deleted. It would seem more sensible to battery-back the RAM on the computer and the hard drive, use a raw partition for the data and a "sequential index" database to figure out where the data starts and how long it is. Batteries guarantee that the state of the computer wi
Database companies have similar problems (Score:3, Interesting)
Re:Database companies have similar problems (Score:2)
Then again, IANADBA, so I don't know the extent of how la
Re:Database companies have similar problems (Score:2, Interesting)
reiser or jfs (Score:3, Informative)
OTOH JFS is quite stable, and though it has less of the elegance in feature set I find in reiser, tends to make up for it with enhanced ruggedness and its handling of large volume/files.
Really can't recommend anything else, as you say, reiser4 is still untested for reliability imho, xfs has issues that vary from kernel to kernel, and ext3 appears quite primitive in comparision, although its journaling seems comparable to the other choices.
JFS if you need the speed, its dead fast in large scales, slower with small files, otherwise Reiser3 is an excellent all-round performer.
IMPORTANT - addendum (Score:2)
also, you mentioned something about burning to dvd, filesystems won't really help with that, i'd look more into taring your filesystem/ fs segments into disc sized segments, then making extra par2 files for error-resilience.
really, backing up 2tb of live data is a f*ing nightmare however you look at it, usually when i reach that hurdle i just build another machine, copy over active data, and put the
zfs is new (Score:3, Informative)
Re:zfs is new (Score:2)
Possibly... (Score:4, Funny)
Just upload to bittorrent, ftp, or some other p2p system, and redownload it if you need it again!
Some small security issues may apply though...
Comment removed (Score:5, Interesting)
Re:It worked for me (Score:2)
Re:Possibly... (Score:2)
1. Encrypt the files and then tack on a header that makes them look like a DRM protected video.
2. Name them something like XXXgirlzongirlz.wmv and then drop them into your share directory.
3. ???
4. Profit!
Not Linux, but try ZFS (Score:5, Interesting)
http://www.sun.com/software/solaris/zfs.jsp
Re:Not Linux, but try ZFS (Score:2)
Ext3 or XFS. (Score:2)
Personally, I'm using XFS for the same task. Why? Because it segments the filesystem, allowing segment locking instead of filesystem locking, which is nice if you're writing multiple big files at once. I've never had a problem with it.
If you are getting countless IO errors, have you done a `badblocks` on the disk? Have you tried a different IO card or
Re:Ext3 or XFS. (Score:2)
Stacking drives and not adequately cooling them is another possible cause of your IO errors with XFS.
Another possibility is checking your hard disks with the manufacturers drive test utility.
Re:Ext3 or XFS. (Score:5, Informative)
While it sucks you've lost data because of XFS, mant people use it heavily every day without issue (I'm one of them) I've deployed XFS across mail, database, and web servers without issue. Your statements about are total FUD. The reason the last 'release' was in 2003 is not long after that, XFS was accepted into the kernel itself. Thus there we no longer a need to 'release' XFS patches for the kernel. If you look at the command packages [sgi.com], you'll see them being updated on a regular basis.
As for bugs, I think your statement of bugs not being fixed is incorrect as well. Check the closed bug list [sgi.com]. You'll see many that are being closed. Also, in your open bug list above, it does appear rather long. But MANY of those bugs are from users who opened a bug saying 'XFS Crashed On Me' and then never followed up with more info. The XFS developers haven't cleaned many of those out it seems. Bugs in the 200s date from 2003, bugs from the 300's from 2004. Late 300's and 400's from 2005.
So I hate you've had data loss - I wouldn't wish that on anybody (having experienced a RAID5 triple disk failure combined with backup tape failure. Thank goodness for OnTrack!) But don't post FUD about a filesystem that has performed very well for a lot of people and continues to be improved and innovative.
Re:Ext3 or XFS. (Score:2)
Hand-tuned ext2/ext3? (Score:5, Interesting)
Just use tune[23]fs to reduce number of inodes significantly on the ext3fs. Or look for -simple- filesystems that don't do tricks in optimization of speed (because these usually waste diskspace), just store your files in a straightforward manner.
tune2fs won't work for that (Score:3, Insightful)
So just poo poo all the options then ask for one (Score:5, Insightful)
journaling methods to prevent data loss due to some natural disaster while data is being shifted around
Journalling doesn't do this!. Journalling helps reduce file system corruption in the event of a catastrophic failure while modifying the file system - ie, it's possible to bring it back to the last clean state before it crashed - journalling does not prevent data loss. You might say "well filesystem corruption and data loss are the same", but they are not. If the filesystem is corrupted, the data is not lost. It just becomes not easily retreivable. If the data is lost then it becomes entirely irretreivable.
I tried using SGI's XFS due to its promising details, but I was met with countless IO errors
Have you considered your hardware is shit? I use XFS on terabytes of raided disks and have been for more years than I remember... 5 or so? I don't see any I/O errors. XFS is very reliable and I trust it with my data.
I feel that Ext3 is not optimal for this
Well not all of your post was dumb!
ReiserFS is too slow when it comes to reading large data files
How is it slow? It takes a few microseconds longer to access the first data sector because it does some extra processing first? Give me a break. Filesystem performance for journalled filesystems is mostly bound by writing speed, and this is a function of how the journal is updated. I doubt you would notice the difference in read speed unless you ran a million tests over a million different files, took some sort of average for the filesystems and quibbled over a few milliseconds.
Reiser4 isn't mature enough to entrust my digital assets to
You entrust your assets digitally? Shit, why do you trust any filesystem? They are all buggy. Give me a break.
If you don't like it, keep backups on other media; buy a tape drive and a robot and get in bed with a good archiving company to securely store the backups. Don't come one here and poo poo all of the file systems known to man then tell me "is there anything better"? About the only 4 in common use you left out were JFS (good for large databases but not much use if you have a lot of small files), FAT[12/16/32] (not much good for anything really), NTFS (see FAT, but more complex) and ISO9660. I'll concede there are others, but if you want something that's in common use so you can actually retreive your data when the world turns to shit...
Anywho!
Re:So just poo poo all the options then ask for on (Score:2)
One more common filesystem: UDF (Score:2)
Just out of curiousity (Score:2)
Re:Just out of curiousity (Score:2)
I haven't seen any unbiassed comparisons of NTFS against some of the other file systems, but if you want to limit your liability I'd go with something that's open. MS have a habbit of breaking backwards compatibility, and Vista seems to want to do a lot of that. Who says the version of Windows in the future will support today's version of NTFS?
At
Re:Just out of curiousity (Score:2)
Re:Just out of curiousity (Score:2)
Re:Just out of curiousity (Score:2)
You have other issues. (Score:2, Informative)
Comparison of File Systems (Score:5, Informative)
Personally, I run two 300GB drives in RAID1 on UFS and am quite satisfied with it, but you seem to be incredibly, incredibly picky, so I'm sure you could find something wrong with it
ND
I/O Errors??? (Score:5, Informative)
Like someone else said -- try using badblocks(8) -- or just use dd to make sure you can read the entire partition without errors.
Bad disks do happen -- even new ones. Production code in Linux is generally very stable, and (unlike with windows), you can usually start with the presumption that things like I/O errors are caused by real hardware problems of some sort (even if it's just bad/loose cables).
Re:I/O Errors??? (Score:2)
$ smartctl -a
prints out everything you need to know about a disk. Smartmontools also comes with smartd, which sits in the background, monitors the disk's attributes and administers regular disk test. It will perform an action such as mailing you or running a script if anything happens that you need to know about.
The one thing it is missing is some kind of long term data storage facility, from
Keep it simple. ext2 or fat32. (Score:5, Interesting)
You want a filesystem you'll be able to read at any point in the future and, should the worst happen, one which you'll have a reasonable chance of being able to recover.
ext2 and fat32 tend to write files in nice large chunks and there are lots and lots of recovery tools for damaged filesystems. Journaled filesystems like to put little pieces all over the place, and recovery of a badly damaged filesystem is next to hopeless.
There is no call for a complex filesystem just because you want to store large files. ext2 (and to some extent fat32) will do just fine, and you'll be glad for them someday in the future when something breaks.
Re:Keep it simple. ext2 or fat32. (Score:2)
ext{2,3} can handle nice, big files, and just about any version of linux can read it. You can even get modules for the Microsoft world to read ext2 filesystems. If you're looking at a read-mostly filesystem, then journaling won't get you much (other than making for a fast FSCK if you lose power).
Just remember to specify '-T largefile' on the mke2fs command line to optimize for larger files.
If you haven't thought about it yet, I'd also suggest raid5 rather than r
Re:Keep it simple. ext2 or fat32. (Score:4, Informative)
Beyond that, I'd say pretty much anything will work fine -- most of the optimizations found in filesystems are needed for lots of small files, not a few large files. For large files, the speeds they can be accessed by various filesystems are not likely to vary more than a few percent unless you let the files get fragmented (which probably isn't a big concern here.)
And you are right -- if something does go wrong, ext2 or ext3 will probably give you the most options for recovering it. NTFS probably has even more recovery options (and FAT even more, as mentioned), but I'm guessing the OS will be *nix. But really, if your goal is reliability, you don't want some esoteric filesystem that can recover from disk errors (because ultimately, none can, though I guess one could be designed to keep ECC codes on the same disk transparantly -- but I'm aware of no such filesystem existing) -- you want multiple copies of your data. Keeping 5-10% (or more) par2 files [sourceforge.net] for your archive can help a lot in recovering it if your media goes partially bad, and having md5sums or CRC32s of all archived files can help determine if you did recover something accurately, but really there's little subsitute for multiple copies of important data in multiple geographical locations. (And no -- RAID is not a subsitute for backups, no matter how many mirrored drives you have. Not that I saw anybody suggest this yet, but it seems to always come up in response to questions like this, so consider this to be a premptive mention of that.)
Re:Keep it simple. ext2 or fat32. (Score:2)
I did say on a badly damaged filesystem. Simple filesystems tend to write files out in larger contiguous segments, which makes them worlds easier to recover when something utterly trashes your filesystem.
Yes, for typical day-to-day power loss type filesystem damage, journaling is great, but if I'm having to try to recover data from a filesystem that's lost 50% of its bits, I want it to be ext2 or fa
Re:Keep it simple. ext2 or fat32. (Score:3, Informative)
Nothing, the programmers will tell you that themselves. Journalled filesystems aren't for protecting your files. They're for protecting your filesystems.
The point of a journal is that you can roll back to a consistent state of the filesystem easily in case of error -- not that you can roll back to a consistent
Re: ext3 works fine, did you try it? (Score:5, Informative)
Re: ext3 works fine, did you try it? (Score:3, Informative)
I/O errors? You have bigger problems than filesyst (Score:2)
Re:I/O errors? You have bigger problems than files (Score:2)
Not quite automagically - every time I've had a bad block develop, the drive has only remapped it on writing to it. Reading just returns various types of error (makes sense - if you're trying to recover the block in question, it might read successfully one time in a thousand, then you can write it back, and all is well again). I'm pretty sure that SCSI can return warnings that a block was hard to read, allo
My problem (Score:2)
My plan was to migrate the small striped set to a larger set that included the old drives by making a raid set of the new drives, copy the data, then add drives (I would call that 'Horizontal'); or move all the data to the tops of the drives, and make a raid set of the lower segments of all the drives, and expand the segments. ('Vertical')
I'm up to using EVMS; but no useful Expand options appear to be availab
Suggestion: (Score:2)
Use LVM on top of MD/RAID. When enough of your physical drives have the requisite extra free space you can just create a *new* MD volume from the free space and add that to your logical volume. Example: If you had a 100GB drive with 1 partition in the RAID and replace it with a 150GB drive, say, you'd just allocate 100GB to 1 partition and allocate 50GB to a second partition, readding the first partition into the old RAID volume and waiting for it to rebuild. Do
Re:My problem (Score:2)
Look for swillden's post a little bit less than halfway down. This is the approach i'm using now.
In short - LVM on top of multile "small" (50 GB per drive) RAID 5 partitions. LVM will let you automagically move all data off of a given "physical volume" if there is sufficient free space on a volume group. Note that in this case the "physical volume" is not actually physical, but is a RAID 5 mdX device instead. Once all data is moved off of one of th
Re:My problem (Score:2)
Ext2 rw,sync (Score:5, Interesting)
Ext2fs mounted with the 'sync' option.
For large sequential writes, nothing could possibly be more reliable or any faster. Your hard drive's pure IO speed will be the bottleneck unless you are writing to multiple files simultaneously, in which case fancy filesystems come in handy.
If that doesn't suit your needs, you haven't described them well enough for anyone to understand.
I feel hungry.
Re:Ext2 rw,sync (Score:2)
For large sequential writes, nothing could possibly be more reliable or any faster. Your hard drive's pure IO speed will be the bottleneck unless you are writing to multiple files simultaneously, in which case fancy filesystems come in handy.
But ext2 (and ext3) store a block list that grows in proportion to the size of the file itself. That's fine for small or highly fragmented files, but it's a waste of space and I/O bandwidth when you have big, unfragmented files.
Re:Ext2 rw,sync (Score:2)
Re:Ext2 rw,sync (Score:2)
You can use a large block-size and greatly reduce the ammount of wasted space and overhead (which is rather small to begin with, actually). You've got to expect that kind of overhead from just about any filesystem, and something like journaling will only add more.
If you've got somet
HFS+, Journaled (Score:2)
Re:HFS+, Journaled (Score:2)
You're right about the tools though. DiskWarrior is *godlike*. I cannot believe the stuff it has pulled off... stuff that made fsck cry.
Re:HFS+, Journaled (Score:2)
WORM (Score:2)
ISO9660 (Score:2)
For all my WORM disks... (Score:3, Informative)
Yeah, I simply burn CD-Rs or DVDs. DVDs have the nice property of being easily stored off-site. And files are in nice large contiguous block so even if the filesystem dies you can still recover a lot. Unlike XJFReiFS 2.3.1.5, DVDs will be readable in 50 years time.
And if you need to burn really large files, just use, well, zip. And perhaps some par2 files.
Though, seriously, they're coming up with a UDF variant for hard drives too.
sync mounted FS, no write cache (Score:2)
As for which filesystem, I would humbly suggest the tried-and-true time-tested UFS. I don't think UDF is really what you want, beca
TAR files written to raw partitions (Score:5, Funny)
The only drawbacks are that you have to read the entire partitioin sequentially to find things, and you can't delete files. Both of these can be fixed with a bit of Perl. Write a program that maintains an index of offsets to the files, then you can use "dd" to skip to the correct offset and read from there. More dangerously, write a program that deletes files from the middle of an archive and shuffles everything backwards to fill in the gaps. You'll want to make sure that no one is trying to read the TAR partition while this is running.
Who cares? (Score:2)
UDF is the correct answer (Score:3, Informative)
What you're looking for is Universal Disk Format [afterdawn.com] or UDF.
It is an open standard [osta.org] supported by all of the major OSes and manufacturers and is the filesystem of choise for Ultra Density Optical [plasmon.com] WORM and rewritable disks.
There a drivers for Linux, Windows and all of the major UNIXes. Here [wikipedia.org] is the obligatory Wikipedia entry.
Hard disk filesystems like XFS, JFS, Reiser, ZFS etc. are all wonderful at what they do but they are unsuitable for WORM disks.
Delete performance - large files (Score:3, Informative)
Others have said good things in general (XFS,JFS,ext3).
I looked into filesystem comparisons in setting up a MythTV box.
My issues were:
(1) efficient use of hard drive space, and
(2) performance.
Efficient use = filesystem settings have a big effect on amount of usable space.
For ext2/3:
-m 0 = setting 'reserved space for root' to 0%. Default is 5%, which can be 10-20 GB these days, all unusable to non-root users
-T ____ = can tell ext2/3 to optimize inodes and byte-per-inode for different size average files. Largefile versus news spools (tons of small files). Because of the way that a file can be spread out and mapped across the filesystem, this has an effect on 'wasted' space, and maybe performance (# of inode entries per file to lookup).
-b, -i - can set total # of inodes and bytes-per-inode directly. Advanced control over filesystem creation
I never got around to looking into this detail for XFS/JFS - they seem have fewer such options.
Performance I'll leave it to others to talk about filesystem performance with largefiles in general.
MythTV takes a lot of writing, and as it turns out, deleting, of large temporary files for the TV features (records, pause, FF/RR). After some reading online, I've found MythTV performance is drastically impacted by filesystem choice due to all of the deleting.
http://www.mythtv.org/docs/mythtv-HOWTO-24.html#ss 24.2 [mythtv.org]
http://www.gossamer-threads.com/lists/mythtv/users /52672 [gossamer-threads.com]
---SNIP---
> My last reply to myself. Based on a Googled reference, I was able to
> break my XFS 4G file size barrier by formatting the partition 'mkfs.xfs
> -dagsize=4g'. So, here are the complete results:
>
> Time to delete a 10G file, fastest to slowest:
>
> JFS: 0.9s, 0.9s
> XFS: 1.3s
> EXT3: 1.4s, 2.3s
> EXT2: 1.6s
> REISERFS: 6.2s
> EXT3 -T largefile4: 5.9s, 10.2s
>
> After running the XFS test, there didn't seem to be any point in
> reformatting the partition again, so I left it on XFS, but I think I
> would be happy with JFS, XFS, or EXT3 w/o '-T largefile4'.
>>>>
wepprop at sbcglobal
Feb 8, 2004, 2:33 AM
Post #21 of 22 (4121 views)
Re: Changing filesystems? [In reply to]
Robert Kulagowski wrote:
> Interesting. If others care to weigh in, I can either re-write the
> "Advanced Partitioning" section in the HOWTO, or whack it completely.
>
> William, can you give some background on the hardware used for your
> tests? I'd be curious if this data holds up across various drive types,
> LVM, etc. (Without trying to exhaustively test all the possibilities,
> that is)
It appears, based on my personal experience alone, that file deletes are
the only system operations that can stress the hard drive enough to
produce dropped frames. Unfortunately, as others have pointed out,
recordings and deletions go together in Myth. So, unusual as it may be,
it does make at least some sense to take file deletion performance into
account when deciding which filesystem to use for a video partition,
especially for people with multiple tuners.
The really ironic result from my personal perspective is that it would
appear that using the '-T largefile4' setting for ext3, which I was so
pleased with because it give me an extra 2G of storage, may well have
been responsible for all those recordings I had ruined by frame drops.
Assuming it works out, though, I could really get to like this XFS
filesystem because it appears to give me slightly more storage space
than ext3 w/ '-T largefile4' did and it has pretty fast deletes as well.
---SNIP---
PFS (Score:2, Funny)
Re:PFS (Score:2)
LOL (Score:3, Funny)
This is a very euphemistic way of saying:
"I download moviez, mp3 and porn via P2P all day and even though I usually don't view any movie twice, I still don't want to throw away anything, because I just can't delete anything".
How that could get an "ask slashdot"-posting is left as an exercise to the reader.
Comment removed (Score:3, Interesting)
Which distro for XFS? (Score:2)
FMWORM (Score:4, Informative)
It's a commercial product from Siemens, which I used years ago for Sietec's large-scale imaging product.
There is probably a Linux port: We ran it on almost everything in existance (;-))
--dave
Re:ZFS (Score:2)
Re:FAT16 (Score:4, Funny)
Re:FAT16 (Score:2)
Re:Retry XFS (Score:2)
Re:Retry XFS (Score:3, Insightful)
Re:Your usage scenarios makes no sense (Score:2)
Now if he asked for a solution that was safe, encrypted and fast enough for streaming media...
Re:WORM optimized FS? (Score:2)
Re:WORM optimized FS? (Score:2, Interesting)
ISO-9660 is not the same as UDF. If you have UDF and ISO-9660 on the same volume it is because some one mastered a hybrid filesystem structure onto the disc. Which was the norm on first generation DVD's.
ISO-9660 contains no optimizations for being a WORM filesystem, there are no linking records in ISO-9660 to allow re-writing of data into new blank spots on the non-rewriteable storage media, UDF supports these linking blocks.
When I wrote