Catch up on stories from the past week (and beyond) at the Slashdot story archive


Forgot your password?
Data Storage

Journal ObsessiveMathsFreak's Journal: Metadata Will Not A Good Filesystem Make

With WinFS closing in upon us and GNOMEs recent move to spatial browsing, it seems that the whole world is going to be moving towards a SQL search based filesystem, in which we use queries to locate our files and where deep nested floders will be a thing of the past. Using metadata on files, we will be quickly and easily be able to find the ones we want amid the heaps of data that now resides on our hard drives. Or will we?

First off what are the reasons behind the switch? Isn't our current filesystem good enough as is? Apparently not according to this OS news article, in which the author argues... I have seen, over and over again, that novice users ... don't get the concept of a file hierarchy. ... 80-90% of the computer users do not need more than 5-7 folders where they put their documents .

I think this is a valid point, if the figues are a little exaggerated. Most new home users will typically not know where to place files initially, or how to navigate a filesystem. They may well be confused by directories and trees, up and back buttons.Novices will also have to face the real issue of simply where to place their files. Microsoft, to their credit, have attempted to solve this problem by giving users the 'My Data' and 'My Pictures' type folders. Open by default by various programs, it gives users the options of simply saving their files to a predestined folder. However it runs into difficulty when hundreds, if not thousands, of files reside in only one folder. The user is now overwhelmed bu the sheer amount of data presented to them.
So what are we to do? Is the current storage ethos all wrong? How can we better cater to novice users? Is a metadata/spatial/query based filesystem the answer?

A filesystem is, at its most basic level, a method for storing files. To do so it must supply answers to a users two questions:
1)Where are my files?
2)Where do I put my files?

Microsoft and others have proposed a query based filesystem running on metadata. Metadata can certainly be very powerful, as Google has shown us. But our computer disc drives are not the web. Files typically do not reference other files. They are self contained, at the lowest level. Google relies on the fact that web pages link to one another. Files don't.
But at least with a query based filesystem, novice users can simply click save and not worry about filesystems or where exactly the file has gone. Also, once saved users can call back the file with a simple query. This would seem to solve both questions in one fell swoop. But does it.

The analogy often used with WinFS and other query filesystems, is that of a user saving pictures and then retrieving them. This example is probobly used as this is exactly who the system is catered towards, a home users saving their pictures, videos and some documents. Such a user will typically not have a huge volume of data, and even if they do they can use previews to see the data they want.
But what about the serious users? The accountant will multiple excel sheets, the programmers with hundreds of source files, the secretary will thousands of word documents. Even the pre-teen with half a dozen games might run into trouble with this system. Why? It has to so with Question One. Where are my files?

Take everone favourite computer relative, Aunt Tillie. How will Aunt Tillie use WinFS. Most likley she will just type in 'pictures' or 'photos' into her query. Perhaps with a 'last august' as well. She will find her pictures from her last august holidays and be happy. When she takes here christmas pictures, she will just save them and will type in 'my christmas pictures' and will find them. Novice bliss.

What about Joe? Joe's an accountant who's just bought a new computer at work with Longhorn. He has upwards of 1000 excel sheets with customers data. How will his query be structured? 'Report for sales to Toyota in august'. Dozens of files could sping up, all with exactly the simlar title. Ones from previous year, gross sales, sales returns, files linking to that file. The file he wants may not show up as some other accountant worked on and saved it just last week.

What about Max? Max is a programmer with hundreds of source, config, init and version files. How will his query be structured? 'Main.c for database project'. Every main.c on the computer could show up, along with every source file on the database project. He may be left to search through quite a lot of files.

Query filesystem proponents will argue that Joe and Max's problems stem from their bad metadata, or that their queries are not detailed enough. But whose job is it to properly form this metadata? The program will most likely fail in this regard as most of the metatags, author,type,size,name will be quite similar. So are Joe and Max expected to fill in metadata? Are they expected to go to the trouble of typing in more detailed queeries for data they may be unsure about anyway? Last august or september? Is this easier or harder than creating a depp nested folder structure, as Joe and Max have been doing for years?

In short a query/metadata based filesystem assumes that the user does not know where their files are. This might be true for novices, but is certainly not for anyone who has uses a computer regularly.

Query filesystems will make things easier for messy users but harder for tidy users? Is this right? Metadata will be hugely usful, and there is no reason not to incorperate it into our existing filesystems. But to abandon directories and deep nested files as some would argue will be a road doomed to overloaded failure.

Such is the cocky, uninformed, ignorent and luddite tone of my first journal entry anyway. Please comment if you agree or disagree, or indeed if you have read this at all.

This discussion has been archived. No new comments can be posted.

Metadata Will Not A Good Filesystem Make

Comments Filter:

Computers are unreliable, but humans are even more unreliable. Any system which depends on human reliability is unreliable. -- Gilb