Linux/Mac/Windows File Name Friction 638
lessthan0 writes "In 1995, Microsoft added long file name support to Windows, allowing more descriptive names than the limited 8.3 DOS format. Mac users scoffed, having had long file names for a decade and because Windows still stored a DOS file name in the background. Linux was born with long file name support four years before it showed up in Windows. Today, long file names are well supported by all three operating systems though key differences remain. "
Damn this is exciting! (Score:1, Insightful)
I RTFA (Score:5, Insightful)
For those of you who haven't read it, here it is: Windows, Linux and Mac OS X all support long file names, albeit differently. Linux is case sensitive, the others are not.
Tada! Two sentances. I imagine, were I a perl coder, I could have done it in half of one, but there you go.
Re:spaces bad, special chars bad (Score:5, Insightful)
If this had been fark... (Score:1, Insightful)
That aside, isn't Slashdot "News for Nerds" ????
How about posting more info about why WinFS was scrapped, rather than how windows/linux/mac has been for the past X years.
Bring on the modding, my karma can take it.
Bah - OS Vendor support of long filenames (Score:5, Insightful)
They way I look at it, the day I look at something like "d3d8.dll" or whatever drek is fermenting in \WINDOWS32\ and it is actually named with a descriptive filename, then that OS will truly support long filenames.
Not sure where the Linux crown compares, but OS X is getting better with each revision. Classic Mac OS had this one down (mostly) cold.
Article is incorrect (Score:5, Insightful)
"Though the file system supports paths up to ca. 32,000 Unicode characters with each path component (directory or filename) up to 255 characters long, certain names are unusable, since NTFS stores its metadata in regular (albeit hidden and for the most part inaccessible) files; accordingly, user files cannot use these names."
The article incorrectly states "Windows file names can be up to 255 characters, but that includes the full path. A lot characters are wasted if the default storage location is used: "C:\Documents and Settings\USER\My Documents\"." I will grant that this may have been a limitation in the past, but XP has had NTFS from the start, and NTFS is by far the most common windows FS today.
Amiga (Score:3, Insightful)
Dan East
Re:c:\progra~1\Micros~1\Powerp~1 (Score:5, Insightful)
Re:NTFS WTF? (Score:2, Insightful)
Re:Long filename horror story (Score:5, Insightful)
Re:Any way to turn off Joliet support in Windows X (Score:3, Insightful)
Re:spaces bad, special chars bad (Score:5, Insightful)
Seems, yes. Is? No way in hell.
The problem is that extensions are part of the filename, i.e. they are arbitrary. Mapping arbitrary data to meta information is stupid at best, dangerous usually and in combination with hidden extensions and automatic execution it is a blatant disregard of even the most basic security procedures.
aka "lookhereiamcertainlynotavirus.jpg.exe"
rsync (Score:4, Insightful)
I just hit the file name issue trying to sync some stuff between unix / Windows XP using rsync.
The case insensitivity was annoying and the limited char set on XP was no good.
Again, you would think they would have fixed this on XP.
Re:Unusual characters in filenames (Score:5, Insightful)
rm
Re:Any way to turn off Joliet support in Windows X (Score:1, Insightful)
Use IsoBuster [isobuster.com]. (The program is shareware, but has a free mode with less functions. The free mode is enough to access the 8.3 files)
colon in Mac OS X file names (Score:4, Insightful)
In Terminal.app, you can create file names with colon, but such character is mapped to a forward slash when seen in Finder. On the other hand, you can use forward slash in Finder, and it is mapped to a colon in the command line.
Historically, Mac OSes use colon to separate folder names in a path.
There is a subtle restriction in HFS+. All files in HFS+ have their names in normalized unicode [unicode.org], and in order to normalize in the first place, file names must be in valid UTF-8 encoding. You cannot use random character string for file names.
There is no such restriction for UFS on Mac OS X. I think UFS supports roughly the same characters as in BSD and Linux and any other Unices. If you're transferring files from Linux with names in a legacy encoding, you can create a UFS disk image and convert file names to UTF-8 before copying them to HFS+.
International support (Score:3, Insightful)
There's a whole new dimension of fun when your file names include non-Roman characters, such as Japanese.
First of all, there is the matter of which encoding the file names are in. Lots of Japanese Windows installs and their utilities still use Shift-JIS for file names. OS X, on the other hand, uses Unicode, and typically expects UTF-8 for file names from programs. In fact, it not only expects it, it enforces it, returning an error when attempting to use a file name which is invalid UTF-8.
Many command utilities that deal with archive files utterly fail on OS X when given archives using Shift-JIS file names, and many others improperly translate it as 8-bit ISO Latin I. A few (such as the command line RAR archiver) are actually smart enough to make a system call to translate the file name from Shift-JIS to UTF-8.
And then there is the issue of Shift-JIS MP3 tags. If you open those with iTunes, not only do they get interpreted as ISO Latin I, but irreversably so if you do something that writes them back to the .mp3 file. (They get written back as a UTF-8 representation of the ISO Latin.) I've had luck in the past using a hex editor and SimpleText in Classic to convert them with much work, but I'm not sure what I'll do with the new Intel Macs that don't support Classic.
Think about it... (Score:3, Insightful)
Your way of thinking.
As it turns out, having multiple "files" composing a "document" is easily mapped in a hierarchical layout. As a simple idea, put all the files into a node and call that node the name of the document.
The "OS" should not impose upon the applications, but should provide ready services that map well into what the application(s) want.
Unix further provides "hard" and "soft" links to allow you to do (for example) sharing. As an example; you have a boilerplate logo image. It can be hard linked into your documents.
"Random" (I do not think you really want random) can be accomplished with soft links.
Content searching? Either "find" or "grep" will do (ok, for up to several hundred megabyte of content -- and if you have hit THAT limit, let me know -- its a separate discussion).
You will have noticed that I have (so far) eschewed GUI tools in this discussion. The blatent omission of find/grep and other tools has mystified me. Either it is hard to do (semantic mapping of symbolic language to pictures) -- which is true, or the GUI designers are deliberatly dumbing things down.
It would be nice to have an "assembler" in file open boxes: I would like to be able to say "Please open a file containing project in the name, whose contents include September 10, which was last modified in 2002, of type ASCII text".
Now, all the tools to do this are included in the "CLI" interface: find, grep, ls, file. But, when we hit the GUI, these tools vanish. "NO SYMBOLIC REASONING FOR YOU - STICK TO THE CONCRETE" is the slogan.
Since the "classic" Unix GUI is basically X supporting XTERM, which in turn launches applications, the CLI is still there. But in modern Linux, Unix, OS X environments, most users are never exposed. And, in turn call for more "paradigms" to be created.
And, this is HARD. Witness Microsofts failure with "WinFS". Witness that the largest jump has been to Plan9, which extends the Unix way (by putting more stuff under this control). Witness the success of mapping things like "/proc". There really HASN'T been a new "paradigm" that offers more.
The problem is that trying to utilize the filesystem is lost in the GUI translation. Apple indexes files, by content, for GUI consumption. This is NOT a new breakthrough -- Unix has had "locate" which is most of the way there for ages. Indexing by content? Again, not new. Merging these ideas is fair, and I wish that Apple had based the kit on CLI for maximum portability.
Ratboy
Re:c:\progra~1\Micros~1\Powerp~1 (Score:2, Insightful)
So "Program Files" becomes progra~1 and "Program Access" becomes progra~2.
"Program Access" gets backed up first with "Program Files" backed up next.
Upon restore, "Program Access" is first, and thus becomes progra~1, then "Program Files" which becomes progra~2.
Oops!
And many MANY applications still use 8.3 to locate files, including many MS applications.
So if you are going to do file backups, you should really do disk images, as file by file backups may cause serious system failure upon restore.
Disclaimer: AFAIK, this was the situation up to Win2K. It may have been resolved since then.
Re:spaces bad, special chars bad (Score:3, Insightful)
Have you ever looked at the first 4 bytes of a Java
All Mac files (until 2001, with in introduction of OSX) had "type and creator codes" in a "resource fork". Now, if you just flatten the resource fork into a data header format, you suddenly have a standardized file header with type and creator metadata in them. But what if your FS doesn't support "resource forks"?
Do things the "Lisa" way. Apparently, the Apple Lisa had some sort of database-like file system (in 1983 - WinFS, eat your heart out) that would assign a file ID, but display a filename, and track a dozen or so metadata fields per file. So in the UI, in any given container, you could have 10 files named identically, but they wouldn't conflict with each other. They would also have a full complement of needed metadata maintained within their file-system wrapper (essentially, their file headers). This was arguably the "next level of file system" after the Unix-style hierarchy (and its timing was appropriate in 1983). It's similar to how MP3 files have ID3 tags embedded in them. Those ID3 tags are just metadata values. If every file in the file system had a minimum "simple set" of metadata tags in the header information, this would work beautifully. Someone should make a general standard for this sort of thing and write support for it into Linux. Apple could probably be persuaded to support it (especially if you allowed them to put their name on the standardization effort). Then MS would probably jump on the bandwagon and say they invented it. Let them (who cares? All we want is decent file metadata).
Your argument about file extensions, though, is not only naive, it's also incorrect. Extensions are part of the filename. I, as a user, can meddle with them in the same manner as the rest of a filename. They are completely arbitrary and can be removed entirely if I so choose. They are not metadata. And if I wanted to pack up a binary file with its own headers or signature and send it over a network, it would work perfectly fine. And if I were to design a file system that would work over a network, I would make the file header format a standard. And it would be every bit as reliable as any other system, except when data got to the recipient it would be guaranteed to have its metadata, rather than an arbitrarily-modified filename that may have lost its file type in the transfer.
I will agree with you about mime-types. Mime-types, as you note, are not reliable because they aren't stored with the file. They're more of a way for a server to tell a client what they're downloading before they download it. They work well for that, but are certainly not a good way of defining file metadata.
Wrong about utf-8 (Score:3, Insightful)
UTF-16 as originally designed handles 0xffff characters.
Because that was not enough characters, UTF-16 was modified to have "surrogate pairs". Usually claimed to now handle 0x10ffff characters, but in fact they fail to subtract the surrogate half-characters (0x800). Also this deleted the only plausable claim that UTF-16 is better than UTF-8, in that characters all are the same number of bytes long (it is in fact worse, because the variable-length characters are much more rare, so bugs in handling them are much less likely to be detected and thus more catostrophic when they do happen).
UTF-8 as originally designed handles 0x7fffffff characters.
Because of the UTF-16 braindamage, the standards for UTF-8 were modified to say that all encodings after 0x10ffff are illegal, so literally UTF-8 was downgraded to match UTF-16. It still is false to say that you can losslessly translate from UTF-8 to UTF-16, due to the surrogate pairs, so they are not equivalent even with this limitation.
The one positive benifit of the "Unix wars" was that it stopped a whole lot of politically-correct idiots from forcing "wide characters" on everybody, and thus Plan9's UTF-8 could take hold. Unfortunatly Microsoft completely ignored all the proof that wide characters were a very bad idea and went and did it themselves in Windows. Still not as bad as if Unix had done it too...
well..... (Score:2, Insightful)