Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Large File Problems in Modern Unices

Posted by CmdrTaco on Sun Jan 26, 2003 09:53 AM
from the stuff-to-deal-with dept.
david-currie writes "Freshmeat is running an article that talks about the problems with the support for large files under some operating systems, and possible ways of dealing with these problems. It's an interesting look into some of the kinds of less obvious problems that distro-compilers have to face."
This discussion has been archived. No new comments can be posted.
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • Not really that groundbreaking... (Score:5, Interesting)

    by CoolVibe (11466) on Sunday January 26 2003, @09:59AM (#5161560) Journal
    The problem is nonexistant in the BSD's, which use the large file (64 bit) versions anyway. And that you have to use a certain -D flag if your OS (like Linux) doesn't use the 64 bit versions. Whoopdiedoo. Not so hard. Recompile and be happy.
  • by cheekyboy (598084) on Sunday January 26 2003, @10:05AM (#5161596) Homepage Journal
    I said this to some unix 'so called experts' in 95, and they said, oh why why do you need >2gig

    I can just laugh at them now...

  • by cyber_rigger (527103) on Sunday January 26 2003, @10:10AM (#5161622) Homepage Journal
    --Bill Gates
  • It will happen with time_t, too (Score:5, Informative)

    by wowbagger (69688) on Sunday January 26 2003, @10:11AM (#5161629) Homepage Journal
    We are seeing problems with off_t growing from 32 to 64 bits. We are also going to see this when we start going to a 64 bit time_t, as well (albeit not as badly - off_t is probably used more than time_t is.)

    However, the pain is coming - remember we have only about 35 years before a 64 bit time_t is a MUST.

    I'd like to see the major distro venders just "suck it up" and say "off_t and time_t are 64 bits. Get over it."

    Sure, it will cause a great deal of disruption. So did the move from aout to elf, the move from libc to glibc, etc.

    Let's just get it over with.
  • A woman's perspective . . . (Score:5, Funny)

    by pariahdecss (534450) on Sunday January 26 2003, @10:12AM (#5161633)
    So my wife says to me, "Honey, do I look fat in this filesystem ?"
    I replied, "Sweetie, I married you for your trust fund not your cluster size."
  • by httpamphibio.us (579491) on Sunday January 26 2003, @10:15AM (#5161649)
    It doesn't give a specific filesize in the article...
  • Funny...in AIX... (Score:4, Informative)

    by cshuttle (613776) on Sunday January 26 2003, @10:18AM (#5161665)
    We don't have this problem-- 4 petabyte maximum file size 1 terabyte tested at present http://www-1.ibm.com/servers/aix/os/51spec.html
  • by alen (225700) on Sunday January 26 2003, @10:19AM (#5161672)
    On the Windows side many people like to save every message they send or receive to cover their ass just in case. This is very popular among US Government employees. Some people who get a lot of email can have their personal folders file grow to 2GB in a year or less. At this level MS recommends breaking it up since corruption can occur.
  • Switch to gnu/hurd (Score:3, Funny)

    by Anonymous Coward on Sunday January 26 2003, @10:22AM (#5161690)
    It has a nice small 1gb filesystem limit. I have partitioned my hard disk in to 64 little chunks and it runs very slowly, and unstabilly, but its completley open source and im happy.
  • by bananaape (542919) on Sunday January 26 2003, @10:34AM (#5161736)
    Its how you use it.
  • I just wonder why we don't learn from past (limits) and remove this limits "forever". E.g. 1 month ago I recieved question of possibility building 10 TB Linux cluster (physics are crazy ;-)).

    There surely MUST be some way how to do this - I just imagine some file (e.g. defined in LSB) which would define this limits for COMPLETE system (from kernel, filesystems, utils to network daemons). I know there are efforts to things like this but if we'd say (for example) thay that distribution in 2004 won't be marked "LSB compatible" if ANY of programs will use any other limits I think it will create enough preasure on Linux vendors.

    Just a crazy idea ;-)
  • The O/S should do it and do it well. (Score:3, Interesting)

    by tjstork (137384) <<tbandrow> <at> <mightyware.com>> on Sunday January 26 2003, @10:41AM (#5161769) Homepage Journal
    1) Splitting up a big file turns an elegant solution into a an inelegant nightmare.

    2) Instead of 10 different applications writing code to support splitting up an otherwise sound model, why not have 1 operating system have provisions for dealing with large files.

    3) You are going to need the bigger files with all those 32 bit wchar_t and 64 time_ts you got!

  • BeOS Filesystem (Score:2)

    by SixArmedJesus (513025) on Sunday January 26 2003, @10:51AM (#5161834) Homepage
    I remember reading in the BeOS Bible that the BeOS filesystem could contain files as large as 18 petabytes. Makes you wonder two things: What's the biggest filesystem that you could use with a BeOS machine? and Why don't other OSs have filesystem like this. Espcecially with those awesome extended attributes. I weep for the loss of the BeOS filesystem...
  • Somewhat cumbersome, even on Linux (Score:2, Informative)

    by topologist (644470) on Sunday January 26 2003, @10:58AM (#5161871)
    To enable LFS (Large File Support) in glibc (which not all filesystems support), you need to recompile your application with
    -D_FILE_OFFSET_BITS=64 and -D_LARGEFILE_SOURCE

    This forces all file access calls to their 64-bit variants, and you'll explicitly need to use structs like off64_t instead of off_t where needed. And I believe most large file support is really available only past glibc 2.2

    Additionally you need to use O_LARGEFILE with open etc. So legacy applications that use glibc fs calls have to be recompiled to take advantage of this, and may need source level changes. Won't work on older kernels either.

  • Error Prevention (Score:3, Interesting)

    by Veteran (203989) on Sunday January 26 2003, @11:13AM (#5161941)
    One of the ways to keep errors from creeping into programs is to put limits on things so high that you can never reach them in the practical world.

    The 31 bit limit on time_t overflows in this century - 63 bits outlasts the probable life of the Universe so it is unlikely to run into trouble.

    That is the best argument I know for a 64 bit file size; in the long run it is one less thing to worry about.
  • by haggar (72771) on Sunday January 26 2003, @11:23AM (#5162003) Homepage Journal
    I had a problem with HP-UX apparently not wanting to transfer via NFS (when the NFS server is on HP-UX 11.0) files larger than 2GB. I had to backup a Solaris computer's hard disk using DD across NFS. This usually worked when the NFS server is Solaris. However, last friday it failed, when the server was setup on HP-UX. I had to resort to my little Blade 100 as the NFS server, and I had no problems with it.

    I have noticed that on the SAME DAY some folks have asked question about the 2 GB filesize limit in HP-UX on comp.sys.hp.hpux !! Apparently, HP-UX default tar and cpio don't support files over 2 GB, either. Not even in HP-UX 11i. I never thought HP-UX stinked this bad...

    How does Linux on x86 stack up? I decided not to use it for this backup, since I had my Blade 100, but would it have worked? Oh, btw, is there finally implemented on Linux a command like "share" (exsts in Solaris) to share directories via NFS, or do I still need to edit /etc/exports and then restart NFS daemon (or send SIGHUP)?
  • What the hell? (Score:1)

    by White_Lightning (638657) on Sunday January 26 2003, @12:38PM (#5162394)
    Why'd they even mention DOS? All DOS programs are staticly linked. There are no dll's or anything like them (except overlays). The only thing close would be DOS Extenders. So, what does DOS have to do with it?
  • by constantnormal (512494) on Sunday January 26 2003, @12:41PM (#5162407)
    ... 64-bit addressing before thinking this through. I couldn't see the significant advantage for more than a very tiny fraction of apps in being able to address more than a few gigabytes.

    Now I can't wait for OS X to have 64-bit support for the IBM 970 processors (I do realize that it will take several releases before default 64-bit operation is practical).

    When compared to clustered 32-bit filesystems, I would think that a "pure" 64-bit filesystem would have a number of very practical advantages.

    I could easily see the journalled filesystem becoming one of the first 64-bit subsystems in OS X, right after VM.
    • 1 reply beneath your current threshold.
  • by mauriceh (3721) <maurice@harddata.com> on Sunday January 26 2003, @12:45PM (#5162431) Homepage
    A much bigger problem is that Linux filesystems have a capacity limit of 2TB.
    Many servers now have the physical capacity of over 2TB on a filesystem storage device.
    Unfortunately this is still a very significant limitation.
    This problem is much more commonly encountered than file size limitations.
  • I miss BeFS... (Score:2)

    by jonr (1130) on Sunday January 26 2003, @01:01PM (#5162529) Homepage Journal
    18 EXAbytes file sizes, real journals, life queries...
    *SOB*
    J.
  • The "l" in lseek() (Score:4, Informative)

    by edhall (10025) <slashdot@weirdnoise.com> on Sunday January 26 2003, @02:26PM (#5163004) Homepage

    Once upon a time (prior to 1978) there was no lseek() call in Unix. The value for the offset was 16 bits . Larger seeks were handled by using the different value for "whence" (the third argument to seek()) which causes seeks to occur in 512-byte increments. This resulted in a maximum seek of 16,777,216 bytes, with an arbitrary seek() often requiring two calls, one to get to the right 512-byte block and a second to get to the right byte within the block. (Thank goodness they haven't done any such silliness to break the 2GB barrier.)

    When Research Edition 7 Unix came out, it introduced lseek() with a 32-bit offset. 2,147,483,648 bytes should be enough for anyone, hmmm? :-).

    -Ed
  • obvious (Score:1)

    by larsl (30423) on Sunday January 26 2003, @04:52PM (#5163669) Homepage
    I would have snapped up puppy.mil in an instant.
  • by jsimon12 (207119) <slashdot@xemu.org> on Sunday January 26 2003, @10:57PM (#5165096) Homepage
    Old news, Solaris 2.6 and 7. Solaris 8 is 64 by default. I hope they are not still developing for 2.6 :)
  • by dfgdfgdfg (577386) on Monday January 27 2003, @09:41AM (#5167153)
    Why does it make a difference for open() whether the size of off_t is 32 or 64 bits? Shouldn't only lseek() be affected?

    I thought only few programs used lseek(), e.g. databases. Wouldn't most programs read files sequentially, whitout using off_t at all?

  • by Ripsaw (216357) on Tuesday January 28 2003, @03:51PM (#5176794) Homepage
    Way back in January of 1995 a group called the Large File Summit [sas.com] was formed to standardize large file access in Unix systems.

    This group produced three notable results:

    • A specification, which was ultimately submitted to X/Open,
    • A declaration that 2**64 bytes is a "bubbabyte", and
    • A really cool T-shirt.

    I still have my T-shirt -- how about you?

  • by Flamesplash (469287) on Saturday February 01 2003, @02:47PM (#5205228) Homepage Journal
    I used to be a student admin for Clemson's College of Engr. and Science [clemson.edu]. We had several CAD tools that the Engr. students would use. There was this one tool that you could specify a duration the simulation was supposed to last, otherwise if the field was blank it would run forever. Besides that little bit of badness the field was blank by default, so many an unsuspecting student would run their simulations and they would run forever creating these huge output files, which the students also didn't know about.

    The killer here, is that if you quit the program the wrong way ( something like Close instead of Quit ) the program would keep going, even after the student would log out.

    So now you have N students who are all generating infinite files. However, the files would hit the 2GB limit and stop eating up space. ( Thank You )

    The only other nasty ness of this is that once we found the file, if you simply removed it, the program (still running after log out) is just able to finally add more data. So you had to track down where the program was runnging and kill it first.

    I was in charge of backups, and man of man was this annoying for them.
  • Re:Why large files (Score:3, Funny)

    by mr.henry (618818) on Sunday January 26 2003, @09:59AM (#5161558) Journal
    Who needs more than 512k of RAM??
    [ Parent ]
  • by xintegerx (557455) on Sunday January 26 2003, @09:59AM (#5161563) Homepage
    Question answered, move along, nothing to see here :)
    [ Parent ]
  • Re:Why large files (Score:1)

    by tgeerts (556153) on Sunday January 26 2003, @10:00AM (#5161565)
    Video + Audio >= 2GB
    [ Parent ]
    • Re:Why large files by amigaluvr (Score:1) Sunday January 26 2003, @10:06AM
      • 1 reply beneath your current threshold.
    • Re:Why large files by AvitarX (Score:2) Sunday January 26 2003, @10:07AM
    • Re:Why large files (Score:5, Interesting)

      by CoolVibe (11466) on Sunday January 26 2003, @10:13AM (#5161636) Journal
      raw video can easily exceed 2 GB in size. Why raw video? Because (like others said) it's easier to edit. Then you encode to MPEG2, which will shrink the size somewhat (usually still bigger than 2 GB, ever dumped a DVD to disk?), so it'll be "small" enough to burn onto a DVD or somesuch. Oh, editing 3 hours of raw wave data also chews away at the disk size. Also, since you need to READ the data from the media to see if it looks nice, you need to have support for those big files as well. Right, now why don't we need files bigger than 2 GB again? Well?

      Oh, you're still not convinced, well see it this way: when in the future will you ever need to burn a DVD?

      Well? A typical one sided DVD-R holds around 4 GB of data (somewhat more), if you use both sides, you can get more than 8 GB of data on it. That's way bigger than 2 GB, no? Now, how big must your image be before you burn it on there? well?

      Right...

      [ Parent ]
    • Wrong. by I Am The Owl (Score:2) Sunday January 26 2003, @10:32AM
    • 1 reply beneath your current threshold.
  • Re:Why large files (Score:3, Informative)

    by voodoopriestess (569912) on Sunday January 26 2003, @10:01AM (#5161570) Homepage
    Databases, Movie files, Backup files (think dumps to tapes). Animations, 3D modelling.... Lots of things need a > 2GB file size. Iain
    [ Parent ]
  • Re:Why large files (Score:5, Insightful)

    by Big Mark (575945) <m_t_douglas&hotmail,com> on Sunday January 26 2003, @10:01AM (#5161571)
    Video. Raw, uncompressed, high-quality video with a sound channel is fucking HUGE. Look how big DivX files are, and they're compressed many, many times over.

    And compressing video on-the-fly isn't feasible if you're going to be tweaking with it, so that's why people use raw video.

    -Mark
    [ Parent ]
    • Re:Why large files by gbitten (Score:1) Sunday January 26 2003, @10:51AM
    • Yep... by Kjella (Score:3) Sunday January 26 2003, @11:06AM
      • Re:Yep... by kasperd (Score:1) Sunday January 26 2003, @11:10AM
        • PAL & NTSC by Kjella (Score:3) Sunday January 26 2003, @11:30AM
    • Re:Why large files by Admiral Burrito (Score:2) Sunday January 26 2003, @08:17PM
  • Re:Why large files (Score:2, Insightful)

    by Ogion (541941) on Sunday January 26 2003, @10:01AM (#5161572)
    Ever heard of something like movie-editing? You can get huge files really fast.
    [ Parent ]
    • 1 reply beneath your current threshold.
  • Re:Why large files (Score:5, Interesting)

    by Anonymous Coward on Sunday January 26 2003, @10:02AM (#5161574)
    Real analytical work can easily produce files this large. Output for analyses of structures with more than half a million elements and several million degrees of freedom can EASILY produce output of over two gigs. Yes, these results can and should be split, but sometimes it makes sense to keep them together as a matter of convenience. Plus, there IS a small performance hit when dealing with multiple files on most of the major FEA packages.
    [ Parent ]
  • Re:Why large files (Score:4, Informative)

    by hbackert (45117) on Sunday January 26 2003, @10:02AM (#5161586)

    vmware uses files as virtual disks. 2GB would be a really, really small disk. UML does the same, using the loop device feature of Linux. Again, a filesystem in a file. Again, 2GB is not much. Simulating 20GB would need 10 files.

    Feels like 64kbyte segments somehow...and I really don't want to have those back.

    [ Parent ]
  • Re:Why large files (Score:1)

    by Timesprout (579035) on Sunday January 26 2003, @10:05AM (#5161597)
    For when Jaron Lanier [sun.com] decides to update his website with 10,000,000 lines of script
    [ Parent ]
  • Re:Why large files (Score:3, Insightful)

    by Idaho (12907) on Sunday January 26 2003, @10:08AM (#5161612)
    Can anyone give a good reason for needing files larger than 2gb?

    I can think of some:

    • A/V streaming/timeshifting
    • Backups of large filesystems (since there exist 320 GB harddisks now, I don't think I should create 160 .tgz files just to back it up, do I?)
    • Large databases. E.g. the slashdot posts table will be easily >2 GB, or so I'd guess. Should the DB cut it in two (or more) files, just...because the OS doesn't understand files >2 GB? I don't think so...

    And that's just without thinking twice...there are probably many more reasons why people would want files >2 GB.

    [ Parent ]
  • Re:Why large files (Score:2)

    by chrisbolt (11273) on Sunday January 26 2003, @10:08AM (#5161617) Homepage
    Database servers?
    Web server log files?
    tarballs?

    Take your pick.
    [ Parent ]
  • by edox. (467593) on Sunday January 26 2003, @10:13AM (#5161639)
    Dont be the good old fox .)
    [ Parent ]
  • Re:Unices? (Score:3, Informative)

    by moonbender (547943) <moonbender&gmail,com> on Sunday January 26 2003, @10:13AM (#5161640)
    Yes. Just like "matrices" is the plural of "matrix". Not that the words have a similar etymology - according to dictionary.com [reference.com] it's, in the authors' words, "A weak pun on Multics".
    [ Parent ]
  • Re:Wrong point of view. (Score:1, Interesting)

    by Anonymous Coward on Sunday January 26 2003, @10:16AM (#5161657)
    As others have noted, there are plenty of good reasons to have files greater than two gigs including video editing and scientific research. The file size limits aren't there for a very good reason at all. Someone years ago had to weigh whether to make small files take up a huge amount of room by using 64 bit addresses that would allow multi-terabyte files to exist against using 32 bit addresses that would make small files smaller and create a 2 gb file limit. At the time, it made perfect sense because nobody was using files anywhere near 2 gb... But now they are.
    [ Parent ]
  • Re:Wrong point of view. (Score:5, Insightful)

    by KDan (90353) on Sunday January 26 2003, @10:17AM (#5161660) Homepage
    Two words:

    Video Editing

    Daniel
    [ Parent ]
    • Cripes! by Hubert_Shrump (Score:2) Sunday January 26 2003, @11:25AM
    • three words by Nick Mitchell (Score:2) Sunday January 26 2003, @12:20PM
  • by Anonymous Coward on Sunday January 26 2003, @10:17AM (#5161661)
    While almost all the examples given are good, I don't think anyone has mentioned complete disk images. I have recently had to do this in order to recover from a hardware issue (drive cable failure resulted loss of MBR, nasty) and on a TiVo unit that had a bad drive.

    I have most all of my older system images available to inspect. The loopback devices under Linux are tailor made for this type of thing.


    I am puzzled as to why you mention the seek times. Surely you would agree that the seek time should be only inversely geometrically related to size, the particular factors depending on the filesystem. Any deviation from the theoretical ideal is the fault of a particular OS's implementation. My experience is that this is not significant.

    (user dmanny on wife's machine, ergo posting as AC)

    [ Parent ]
  • by N1KO (13435) <nico.bonadaNO@SPAMgmail.com> on Sunday January 26 2003, @10:20AM (#5161678)
    In a couple of years, will todays large files be considered large? Ten years ago having hundreds of 4MB files on a pc would've been considered crazy. Now everyone with an mp3 player is used to it.
    [ Parent ]
  • Re:Why large files (Score:3, Interesting)

    by bourne (539955) on Sunday January 26 2003, @10:21AM (#5161679)

    Can anyone give a good reason for needing files larger than 2gb?

    Forensic analysis of disk images. And yes, from experience I can tell you that half the file tools on RedHat (like, say, Perl) aren't compiled to support >2GB files.

    [ Parent ]
  • Re:huh? (Score:1)

    by KDan (90353) on Sunday January 26 2003, @10:21AM (#5161681) Homepage
    It's certainly something that George Orwell would have frowned upon [mtholyoke.edu], but it's not incorrect sentence construction per se.

    PS: Read that Orwell article if you haven't yet, it's really very good
    [ Parent ]
    • 1 reply beneath your current threshold.
  • Re:huh? (Score:2, Informative)

    by JanneM (7445) on Sunday January 26 2003, @10:21AM (#5161685) Homepage
    Because the sentences mean different things.

    "It is an interesting problem that some distro-compilers have to face."

    talks about the problem facing distro compilers, whereas

    "It's an interesting look into some of the kinds of less obvious problems that distro-compilers have to face."

    Talks about the article adressing these problems. /Janne
    [ Parent ]
  • Re:Wrong point of view. (Score:5, Funny)

    by heby (256691) on Sunday January 26 2003, @10:22AM (#5161692) Homepage
    "oh yes, those were the days." - misty eyed smile - "when i was young and filesizes were small. you should have seen it. today's youth is so spoiled that they don't even learn assembly language any more. i tell you, you're all going to die because of your large files, yes, die!" - madly waves his cane in the air - "2gb, that's more than anybody will ever need and you are greedy for even more! the holy bit will punish you for this, it will!" - dies of a heart attack.
    [ Parent ]
  • Re:Unices? (Score:1)

    by Looke (260398) on Sunday January 26 2003, @10:23AM (#5161695)
    Geeks seem to have a weird fascination for strange spellings. "-ces" is the traditional plural ending of Latin words ending in "x". Obviously, "Unix" does not originate from Latin, and "Unices" is thus nothing but a (bad) joke. (The same applies to "emacsen", and there are a few others around as well.)
    [ Parent ]
    • Re:Unices? by david-currie (Score:1) Sunday January 26 2003, @10:36AM
    • Re:Unices? by N1KO (Score:1) Sunday January 26 2003, @10:47AM
      • Re:Unices? by yuri benjamin (Score:1) Monday January 27 2003, @01:20AM
      • 1 reply beneath your current threshold.
    • 1 reply beneath your current threshold.
  • Re:huh? (Score:1)

    by david-currie (104829) on Sunday January 26 2003, @10:28AM (#5161711)
    Because it's not an interesting problem. It's a fucking boring problem if _you_ have to deal with it. But it's interesting to read about because it's the kind of thing you probably haven't thought about if you don't compile distributions. I meant what I wrote.
    [ Parent ]
  • Umm, scientific computing (Score:1, Insightful)

    by Anonymous Coward on Sunday January 26 2003, @10:29AM (#5161713)
    Many large-scale computing projects easily generate hundreds of gigabytes and even terabytes of data. They are writing to RAID systems and even parallel file systems to improve their IO.

    Think beyond the little toy that you use. These projects are using Unix (Solaris, Linux, BSD and even MacOSX) on clusters of hundreds or thousands of nodes.
    [ Parent ]
  • Re:Wrong point of view. (Score:1, Insightful)

    by Anonymous Coward on Sunday January 26 2003, @10:30AM (#5161720)
    the use of large files tempts users to store all kinds of redundant, reducible, linear and irrelevant data wasting storage space and I/O time

    As opposed to a million 4k files that are each 1k of header?
    [ Parent ]
  • Re:Wrong point of view. (Score:5, Insightful)

    by cvande (150483) <craig.vandeputteNO@SPAMgmail.com> on Sunday January 26 2003, @10:30AM (#5161722)
    In a world everything is small and manageable. Unfortunately, some databases need tables BIGGER than 2gb. Even splitting that table into multiple files still finds you with files larger than two gb. Try adding more tables? OK. Now they've grown to over 2gb and the more tables the more complicated everthing gets. I still need to back these suckers up and a backup vendor that I won't name can't help me because their software wasn't large file (for Linux) ready. So let's get into the game with this and make it the default so we don't need to worry about these problems in the future. Linux IS an enterprise solution.....(my $.02)
    [ Parent ]
  • Re:Why large files (Score:2, Insightful)

    by benevold (589793) on Sunday January 26 2003, @10:31AM (#5161723) Homepage Journal
    We use a Unidata database here for an ERP system, each database is more than 2gb a piece (more like 20 gb) of relatively small files, when the directories are tarred for backup reasons they are usually over 2gb which means that gzip won't compress them. Unless I'm missing something I don't see an alternative for files large than 2gb in this case. Sure on the personal computing level the closest thing you probably get is ripping DVD's but there are other things out there, and I realize this is tiny in comparison to some places.
    [ Parent ]
  • Re:Why large files (Score:2)

    by Veteran (203989) on Sunday January 26 2003, @10:35AM (#5161740)
    I have run into problems trying to compress a tar archive of my home directory which has been around since 1995 when I switched to Linux. The two gig limit runs into trouble here.
    [ Parent ]
  • Re:Why large files (Score:4, Insightful)

    by kasperd (592156) on Sunday January 26 2003, @10:35AM (#5161742) Homepage Journal
    The seek times alone withinr these files must be huge

    Who moded that as Insightful? Sure, if you are using a filesystem designed for floppy disks, it might not work well with 2GB files. In the old days where the metadata could fit in 5KB a linked list of diskblocks could be acceptable. But any modern filesystem uses tree structures which makes a seek faster than it would be to open another file. Such a tree isn't complicated, even the minix filesystem has it.

    If you are still using FAT... bad luck for you. AFAIK Microsoft was stupid enough to keep using linked lists in FAT32, which certainly did not improve the seek time.
    [ Parent ]
  • Re:Why large files (Score:1)

    by Martin Schröder (21036) <martinNO@SPAMoneiros.de> on Sunday January 26 2003, @10:37AM (#5161750) Homepage
    Bitmap files for image setters can easily become huge. Think of 500x100(cm)x1000x1000(pixels).
    [ Parent ]
  • Re:Wrong point of view. (Score:5, Insightful)

    by costas (38724) on Sunday January 26 2003, @10:42AM (#5161774) Homepage
    Maybe in your problem domain that's true. I work with retailer data mines and we've hit the 2GB file limit, oh, 4-5 yrs ago? We've been forced to partition databases causing maintainance issues, scalability issues, and the like, just because of the size of a B-tree index.

    True, it looks like the optimal solution is lower-level partitioning, rather than expanding the index to 64bits (tests showed that the latter is slower), but that still means that the practical limit of 1.5-1.7 GB per file (because you have to have some safety margin) is far too constraining. I know installations who could have 200GB files tomorrow if the tech was there (which it isn't, even with large file support).

    I am also guessing that numerical simulations and bioinformatics apps can probably produce output files (which would then need to be crunched down to something more meaningful to mere humans) in the TB range.

    Computing power will never be enough: there will always be problems that will be just feasible with today's tech that will only improve with better, faster technology.
    [ Parent ]
  • Re:huh? (Score:2, Interesting)

    by RumpRoast (635348) on Sunday January 26 2003, @10:44AM (#5161792)
    Actually you changed the meaning of that sentence. I think really we object to:
    "It's an interesting look into some
    of the kinds of less obvious problems that distro-compilers have to face."

    "of the kinds" really adds nothing to the meaning here, nor does "have to"

    Thus we have:

    "It's an interesting look into some of the less obvious problems that distro-compilers face."

    The same sentence, but much cleaner!

    Thanks! I'll be here all week.

    [ Parent ]
    • Re:huh? by david-currie (Score:1) Sunday January 26 2003, @10:52AM
  • by Q Who (588741) on Sunday January 26 2003, @10:47AM (#5161812)

    Lmao...

    Your other trolls are nice too, but this one is hilarious... "entropy pollution", hehe :)

    "Linux of Windows XP bootloader", this one is amazing. I wonder whether it's a typo, or intentional...

    [ Parent ]
  • Re:Why large files (Score:1)

    by Perl-Pusher (555592) on Sunday January 26 2003, @10:48AM (#5161817)
    Science Data usually consist of huge multidimensional arrays. I have seen satellite data in huge netcdf files that are very close if not slightly larger than that.
    [ Parent ]
  • Re:Why large files (Score:1)

    by markz (448024) on Sunday January 26 2003, @10:49AM (#5161820) Homepage
    database dumps - one of our smaller database dumps is 2.3 GB compressed. The dumps are the easiest method of backup and distribution - locally and (very) remotely.
    [ Parent ]
  • by SoSueMe (263478) on Sunday January 26 2003, @10:51AM (#5161832) Homepage
    Who are we to tell them what they have to accomodate?
    Don't like the way a particular *NIX works? Don't use it.
    Try something else.
    [ Parent ]
    • 1 reply beneath your current threshold.
  • Re:Why large files (Score:2)

    by joto (134244) on Sunday January 26 2003, @10:51AM (#5161835)
    Can anyone give a good reason for needing files larger than 2gb?

    Yes. Sometimes you need to store a lot of data. Even DVD's has 4.3 GB of data these days. But that's not even much compared to the amount of data we handle in seismic research. I would believe astronomists, particle physicists and a lots of other people also routinely handle ridiculous amounts of data.

    By the way, in producing the DVD, you would naturally work with uncompressed data. How would you handle that?

    The seek times alone withinr these files must be huge, and it smacks a bit of inefficienecy

    And because it is inefficient, we should not support it? As a matter of fact, any file larger than one disk-block is inefficient. Maybe we should stop supporting that as well?

    sure its just as bad to have an app use hundreds of say 4kb files or so, but two GIGABYTES???

    As I've said, it's not really that much, depending on the application.

    [ Parent ]
  • Re:Wrong point of view. (Score:5, Interesting)

    by Yokaze (70883) on Sunday January 26 2003, @10:52AM (#5161843)
    I'm not a specialist on this matter, so maybe you can enlighten me, where I am wrong or misunderstood you.

    > fragmentation: large files increase to fracmentation of most file systems
    What kind of fragmentation?

    Small files lead to more internal fragmentation.
    Large files are more likely to consist of more fragments, but when splitting this data into small files, those files are fragments of the same data.

    >entropy pollution
    What kind of entropy? Are you speaking of compression algorithms?

    Compression ratios are actually better with large files than small files, because similarities between files across file-boundaries can be found. Therefor, gzip(bzip2) compresses a single large tar-file. (Simple test, try zip on many files and then zip without compression and subsequent compression on the resulting file).

    >data pollution
    How should limiting file size improve that situation? Then, people tend to store data in lot of small files. What a success. People will waste space, whether there is a file size limit or not.

    >These limits are there for very good reasons and in my opinion they are even much to big.

    Actually, they are there for historical reasons.
    And should a DB spread all its tables over thousands of files instead of having only one table in one file and mmapping this single file into memory? Should a raw video stream be fragmented into several files to circumvent a file limit?

    >[...] original K&R Unix [...] was much faster than modern systems

    Faster? In what respect?
    [ Parent ]
  • Re:Why large files (Score:3, Interesting)

    by Zathrus (232140) on Sunday January 26 2003, @10:53AM (#5161846) Homepage
    In my previous job we regularly processed credit data files >2 GB. All the data is processed serially (as someone else mentioned), so seek time is not an issue (nor is it an issue in a binary data file - seek to 1.4GB. Done. Next.).

    The real issue we ran up against was compression... we wanted to have the original and interm data files available on-disk for awhile in case of reprocessing. The processing would generally take up 10x as much space as the original data file, so you compressed everything. Except that gzip can't handle files >2GB (at the time an alpha could, but we didn't want to touch it). Nor can zip. So we had to use compress. Yay. (bzip could handle it, but was decided against by the powers that be).

    Compression of large files is still an issue, unless you want to split them up. Unless you download a beta version gzip still can't handle it. As I understand it zip won't ever be able to do it. There are some fringe compressors that can handle large files, but, well, they're fringe.
    [ Parent ]
  • Re:Why large files (Score:1)

    by imnoteddy (568836) on Sunday January 26 2003, @10:55AM (#5161858)
    Databases.

    The computer aided design databases for an automobile, when you have 3D models for the parts, the tooling, plant layout, etc. is in the low terabyte range [baselinemag.com]. As another example, Boeing dedicates about 14 terabytes [mcadcafe.com] to commercial airplane geometry data storage.

    Or Astronomy. A planning document [pparc.ac.uk] talks about a project generating 300 terabytes per year.

    [ Parent ]
  • Re:Why large files (Score:2)

    by Markus Landgren (50350) on Sunday January 26 2003, @10:56AM (#5161865) Homepage
    Last time I wrote a 7 gig file it was an image of a hard disk. Lots of other stuff (video) can get large too. Anyway, there is an error in the headline. 2 gigs is not a limit in modern unices, only in ancient or otherwise really crappy unices.
    [ Parent ]
  • Re:Wrong point of view. (Score:3, Interesting)

    by kasperd (592156) on Sunday January 26 2003, @11:00AM (#5161881) Homepage Journal
    I sure hope that was a joke. Because otherwise it would be one of the most clueless comments I have seen.

    Sure spliting data into a lot of smaller files is going to reduce the fragmentation slightly, but it is not going to improve your performance. Because the price of accessing different files is going to be higher than the price of the fragmentation.

    In the next two arguments you managed to make two opposite statements both incorrect. That is actually quite impressive.

    First you say large files increase the entropy of the data stored on the disk. Which is wrong as long as you compare to the same data stored in diffeerent files. Of course if the number of files on the disk is constant smaller files will lead to less entropy, but most people actually want to store some data on their disks.

    Then you say large files are highly redundant, which is the opposite of having a large entropy as claimed in your previous argument. And in reality the redundancy does not tend to increase with filesize, but might of course depend on the format of the file.

    All in all you are saying that people shouldn't store many data on their disks, and the little data they do store should be as compact as possible, while still allowing it to be compressed even further when doing backups. You might as well have said people shouldn't use their disks at all.

    Finally claiming older Unix versions were faster is ridiculous, first of all they ran on different hardware. And surely on that hardware they were slower than todays systems. And even if you managed to port an ancient Unix version to modern hardware, I'm sure it wouldn't beat modern systems in todays tasks. Which DVD player would you suggest for K&R Unix?

    [ Parent ]
  • Re:Why large files (Score:2)

    by wideBlueSkies (618979) on Sunday January 26 2003, @11:17AM (#5161963) Journal
    That tarball of 2002 stock quotes used to feed your stock research system.

    The database files themselves, in the system.

    [ Parent ]
  • A few more words: (Score:1)

    by JohnnyBigodes (609498) <morphine@d i g i t almente.net> on Sunday January 26 2003, @11:18AM (#5161971)
    - Backups so a single file (no, I don't want to copy a fscking whole directory structure, thank you very much.
    - Video editing.
    - Large sound editing (multi-channel).
    - Ever tried to create a DVD ISO image? there you go...
    - Speaking of DVD's, *you* try dumping one to your harddisk with 2GB files.
    - Disk images (ever had to Ghost around a boot-disk or boot-DVD with a disk image?)
    - 3D animation files (probably included in the "video editing" section).

    want me to go on? the list is bigger...
    [ Parent ]
  • by smoondog (85133) on Sunday January 26 2003, @11:32AM (#5162065)
    There is not a problem with support of large files in Unix system, there is a problem with incompetent people using too large files in Unix systems.

    You are a troll. It is not up to administrators to decide how big a file needs to be. I do scientific research and deal regularly with datasets larger than 300GB. Single files often in the range of 2GB-10GB. For me to split up my data would create an enormous headache, and would be very slow.

    -Sean
    [ Parent ]
  • Re:Why large files (Score:1)

    by Proneax (609988) on Sunday January 26 2003, @11:32AM (#5162066)
    I remember like 4 or 5 years ago talking to my friend's dad, who works at kodak, and he would fill an entire 2gb jazz drive with one picture.
    [ Parent ]
  • Re:Why large files (Score:1)

    by addps4cat (216499) on Sunday January 26 2003, @12:18PM (#5162300) Homepage
    Hey everyone lets keep beating a dead horse and telling him the million and one ways that you need files greater than 2gb. Half of these posts just say "movies" anyway. So stop repeating yourselves.
    [ Parent ]
  • Re:Why large files (Score:2)

    by Sayjack (181286) on Sunday January 26 2003, @01:11PM (#5162588) Homepage
    Backup files, exporting a huge oracle database to a file. And, when I record divx quality video through my ATI card I can go through the GB like crazy.

    A better question is, Who doesn't need largefile support?

    As for the seek time...not everything is accessed like a random access file. I imagine that the backup data will be read in sequentially. The video file would mostly be handed sequentially other than when jumping to a chapter fast forwarding or reversing.
    [ Parent ]
  • Re:Why large files (Score:2)

    by AJWM (19027) on Sunday January 26 2003, @01:22PM (#5162661) Homepage
    Can anyone give a good reason for needing files larger than 2gb?

    Video/movie files, for one thing. Even compressed (eg DV or MPEG) those things are huge. A 2 GB file at professional DV compression (50 Mb/sec) is about 4 minutes worth. (DV is similar to MJPEG, so it's still lossy. Uncompressed or unlossy compressed video (critical for machine vision or image analysis apps) chews even more space.

    I know I've wanted to be able to just dump a mini-DV tape (about 13 GB) directly to a single disk file for later editing.

    Other fields also use huge data sets - seismic data analysis for example. Filesystems designed for supercomputer clusters (eg PVFS) have unlimited size on the total filesystem (tens of terabytes is not unusual) although the individual file size may still be limited by the underlying OS or hardware word size.

    Then there's creating a .zip or .tgz of a collection of big files. Or creating the equivalant of an ISO image of a DVD. And so on.
    [ Parent ]
  • 13 replies beneath your current threshold.