1) Guideposts? - by TopShelf
Having obtained financing for the project, how does that impact the future direction of development? How do you balance the interests of developers, users and sponsors to choose which updates to pursue?
There will always be more features that I would like to implement than I can implement. Users and sponsors for the most part want good things to be added, and it is really not so bad to first implement the features that someone is willing to pay for, and hope that in a few years someone else will pay for other features. I take a 30 year perspective on the project, and I have as my final objective the elimination of reasons to not use the filesystem as the unifying namespace of the operating system. That makes for quite a lot of features that I am willing to add.;-)
There is an common situation though where pressure from sponsors is severely negative, and that is when they want a quick hack that meets their needs but lacks elegance. It is not usually the features they want that are wrong, it is the timeframe they want them in and the shortcuts they expect to be made to meet that timeframe. This happened with ACLs and extended attributes for instance. I turned away 4 different sponsors for ACLs before I was lucky enough to find DARPA.
All of the commercial sponsors wanted some quick hack that would not be consistent with the semantics I am evolving ReiserFS towards, and would leave us with unwanted additional primitives. To DARPA I proposed that filesystem designers were not providing security researchers with the right infrastructure that they needed, and this lack of the right infrastructure was leading to inelegant hacks. For instance, I argued that there was no need for extended attributes, that instead there was a need for *files* that
You can probably imagine small appliance vendors not being willing to wait 18 months and spend lots of money when all they want is for ACLs to work well enough that they can sell a samba server, yes?1. were efficiently accessed many at a time (new system call that reads and writes many files in one call)
2. were accessed atomically (transactions infrastructure)
3. had constraints on their allowed values not just who can write to them
4. inherit some contents/metadata from other files/directories (note that streams share a common metadata)
5. could be made invisible to readdir
6. could be both files and directories depending on whether you accessed them as files or directories
7. could be implemented via a rich plugin infrastructure that allowed one to compose new plugins by selecting methods from already existing plugins, and only writing from scratch that which was truly unique to the plugin
8. efficient storage of small files (V3 was suffering a performance loss when tails were turned on (that is, when small files are packed several to a block) that V4 cures)
With DARPA, if you advance the field of security research by developing along an angle the proposal reviewers thought was interesting, it is enough to get the funding. It is easy to think that private industry is sufficiently well motivated to fund long-term research, but unfortunately long-term research tends to benefit a lot more than just its creators and funders, and only the government funds work whose benefits will be mostly diffused throughout society. Or at least that is one theory to explain what happens. The observed reality is that venture capital does not fund long term research, persons wanting to do long term research mostly must pursue government funding, and the readers are encouraged to suggest other explanations of that if they have them.
I would be curious to know if privately held companies without plans to go public (such as Namesys) generally tend towards more long term research. Surely society needs long term research.;-)
DARPA is really quite excellent to work for. Having the right customer is very important to the development of a product, and I learned an enormous amount about serious approaches to security that I would never have learned without participating in their environment. Their accounting requirements are exacting, but it all has a reason, and for each requirement you can easily imagine the taxpayer being abused by someone without it. Actually I was a bit reassured as a taxpayer about how the DoD spends our money by seeing them in action, and being a small company we learned a lot about generally good professional accounting practices by adopting their requirements.
2) Good business planning - by mao che minh
Did you embark on this project in hopes of making a profitable business? It certainly seems that way, considering that you went looking for sponsorship and even planned pay-per-incident support, showing that you were prepared to work the whole "support revenue" angle.
Now you just need to hire someone to desire a modern, more "commercially pleasing" website. =)
We make enough from pay per incident at www.namesys.com to support 1/4 of a programmer, and monitoring all requests to ensure that a professional response is always given consumes some of my time. This revenue is steadily growing though, but since we make so little money from it we don't bother with trying to charge a lot ($25 currently), instead we just hope that word of mouth will make it continue to grow, and maybe in a few years it will become something more significant. I think we were key to deflating some of the excessive per incident support rates that were out there for Linux, and this is good, because who can afford $250 to be told how to make Xwindows work?
Where I had originally planned to make money was by using Linux as a market sampling methodology while selling to OS vendors. That a bunch of hackers who had written software I used myself would get to use it for free was ok with me, and a product that was used by the OS vendor's engineers at home would be an easier sell I thought.
Unfortunately, paying money just to catch up to Linux does not seem to excite OS vendors, even if in reality it is very important that they do so. It is theoretically irrational but reality real that the proprietary OS vendors have not yet bought a license.
Instead, I have sold to storage appliance startups who need a code base to start from, and this has been slightly more than half of our income.
We have only sold one priority support contract (this is where you have the right to wake us up and you pay in proportion to the amount of hardware you have), but we sold it to Lycos, and they keep doubling their usage of ReiserFS every year and increasing the contract accordingly. They have been very happy with the support we provided. This one contract is much larger than all our pay per incident fees combined, and thanks much to Lycos for being a customer.
It would be nice if I could sell some government a nice ReiserFS support contract....
Many users don't realize it, but we don't make a lot of money here at Namesys, the programmers are very much overdue for pay raises, and there are many good people I can't hire because we have no funds for it. Hopefully this will change though as ReiserFS increases its technical lead over other filesystems with time. Reiser4 is going to give us some very compelling performance and security advantages, and we will be the easiest filesystem to hack on thanks to the plugin infrastructure. That infrastructure is really going to accellerate development by a lot, and provide us with a compelling competitive advantage. It is interesting to watch Nikita casually toss together a couple of plugins in an afternoon when it suits his whimsy. We can already see how much easier it is going to make doing the semantic enhancements we have planned, and then once we have the semantic enhancements out there and in use, performance will no longer be the primary decision point for users.
As for the website, by "commercially pleasing" I assume you mean using bland and uninteresting graphics, and corporately styled, with lots of insincerity everywhere.;-) Forgive me if I read too much into your comment, but you would not be alone if I read you correctly.
It is important to be true to oneself. You should maybe understand that some years ago I put the suit my mother bought me into the fireplace along with all my ties. If a restaurant requires a suit and tie, I just don't go there, it is not for the likes of me. If you need some corporately commercial justification for our website design, then let it be that I am willing to be cool and appealing to the younger generation, etc., instead of bland. If a pedagogical justification will work for you, then let me point out that military manuals with their cartoon based approach are far more effective in engaging the reader than the pedagogical techniques employed by most college textbooks. The military is more advanced in its pedagogical technique than the university system, which is really rather amusing, and I think it is due to the greater pretentiousness of universities in this matter.
3) Versioning - by tjansen
Beside the finding and organizing files, the biggest problem for desktop users today is probably that changes on the file system are not recoverable. It is easy to accidentally overwrite a file and lose your work, and the only only sane way to solve these kinds of problems would be to make it possible to revert changes.
Several research systems have been created, like the Elephant File System, but none of them made it into the mainstream free and commercial operating systems. Are there any specific reasons why nobody offers recovery (high complexity in implementation, very bad effect on performance, etc) or is it just because FS designers don't see the need for it?
Actually, there is a version control filesystem called Clearcase that costs thousands of dollars per seat. (If I wanted to make more money, I could return to working as a Clearcase sysadmin --- there are some jobs that pay very well because nobody wants to do them.;-) ) Clearcase was written a bit too quickly, and its performance sucks as a result (though I am told that has improved since I left that field). If it used Reiser4 as a backend it would be a lot quicker.
Version control definitely belongs in the filesystem. Clearcase may have a lot of implementation uglies, but as a concept it clearly works. Filesystems manage files, and that should include managing file versions. Our support for transactions and compression should make it easier to implement version control in Reiser4. As soon as someone offers to sponsor it, we will do it.
Larry McVoy makes a lot of arguments about how the economics of version control means it has to be expensive. I think this would not be true if three things changed simultaneously: 1) it was integrated into the filesystem, 2) it was free, 3) it was easy to understand for average users. For 3) to be true it has to become as easy as, say, .snapshot directories on a NetApp are, for the average user. It should also be as well integrated into apps as version control is in MS Word. I predict that in 20 years version control in filesystems will be standard and expected by all users as a basic feature.
4) Filesystems and metadata by androse
In your Future Vision white paper, last modified in January 2001, you outline several very interesting ideas about metadata.
Several developements have taken place since; the extensible attributes of BeFS has been buried with BeOS, the database-like metadata of Longhorn (aka Yukon) may actually be a separate layer from the filesystem altogether, and Apple is also moving all metadata out of the filesystem to XML files shared between applications (see iLife package).
My question: What is your current take on the metadata debate? Do you still think the filesystem is the right place to handle metadata? Any predictions?
Reiser4 required some fundamental breakthroughs in tree balancing technology before small files could be combined into one block without adding additional seeks for typical usage patterns. In particular it required discarding the BLOB paradigm, recognizing that BLOBs unbalance the tree, and creating a new more height balanced tree. (See www.namesys.com/v4/v4.html) It does not surprise me that the other filesystems failed to find these techniques; until we had benchmarks no one but me on the team thought this new stuff would work. I think you can reasonably assume that MS abandoned its efforts to put the database into the filesystem because their algorithms failed to deliver good performance. Without these techniques, extensible metadata causes performance problems that constitute a market entry barrier.
I should be careful in my phrasing here: Reiser4 does not support attributes, it merely supports files that you can choose to use for metadata, and its files have all the functionality you need for doing the usual metadata tasks should you choose to use them that way.
Oh, and BeFS is probably not all that dead, Dominic Giampaolo the author is working for Apple now, and since he is very bright and capable there are likely to be interesting things coming out of Apple in the future (probably not called BeFS though).
5) Researching filesystems - by ProteusQ
I'm going back to school this fall, and in a year I hope to be admitted into a Masters of Computer Science program. I'd like my main research focus to be on filesystems.
I'm preparing by reading everything I can find: I'm working on Tanenbaum & Woodhull's "OS Design & Implementation"; I've read "Design and Implementation of the Second Extended Filesystem"; Steve Pate's "UNIX Filesystems" is waiting on my shelf; and of course, there's the FAQ and ReiserFS v.3 Whitepaper at www.namesys.com [namesys.com]. Specific questions: what branches of math are useful in this line of research? Any books, articles, etc., that I haven't listed that are a 'must read' or 'should read'? Those who have succeeded in building a better filesystem: what have they done that I should also do? Any mistakes I should avoid? Anything that no one told you about filesystems that you wish you had known up front? And are there any special tricks (above and beyond mastering your subject) to getting hired in this field once a degree is in hand?
I was never able to get hired in this field, so I am probably not the one to ask about how to get hired.;-) Hmmm. Oh I know one! Don't tell your potential employer that you are working on your own file system nights and weekends, and you will retain all rights to it, and you won't stop work on it once they hire you.;-)
You should probably read about Plan 9, and about namespaces generally. The literature on namespaces seems to be just about hierarchical namespaces, but the notion present in that literature that they should be unified is a good one. I rather liked Gerard Salton's book on automatic text processing. Ted Nelson's Xanadu project was interesting reading, and you'll want to read Codd and Date about databases. Mikhail Gilula's book about set theoretic databases is a good one.
In regards to math, study the design of new mathematical models. Study closure, and its importance to various models ranging from algebra to relational algebra. Understand why mathematical models were designed to have the structure they have rather than learning what those structures are, so that you can learn to construct your own models. I don't know of any courses that teach that, but it is what is important to learn.
Are you sure that it wouldn't be better to hang out in cafes and bookstores for 4 years, and at the end of it write some piece of a filesystem? Cafes, bookstores, and attending random seminars will educate you better, and writing some piece of a filesystem will employ you better.
If only one could get student loans for the purpose of hanging out in cafes and bookstores for 4 years I would have been so happy....
6) where next? - by wfmcwalter
Reiser FS is already a pretty mature, stable, usable product. Once V4 is done, is there really much work left to be done on ReiserFS proper? Do you have a giant to-do list that'll keep you and the guys occupied for years, or do you intent to work in a diffent direction (SAN, networkFS, databases, etc.)?
(or perhaps you'll just retire to Portugal and play lots and lots of golf)
V4 is a local host storage layer. V5 will make it a distributed storage layer. V6 will enhance to the semantics to where one can do semi-structured data queries. Whether V5 or V6 comes first depends on funding.;-)
A new model for doing semi-structured data queries was my original goal, and it remains my primary goal, and the storage layer was just a necessary prerequisite to getting there. I describe the enhanced semantics I plan at www.namesys.com/whitepaper.html.
7) Starting Large Free Software Projects - by unsinged int
When you began a file system project as a free software project, you must have known that (assuming it worked) it had the potential to turn into a big project. How did you determine how long to work on it as your own project before making the first release? I imagine there must have been a strong temptation to just get it "out there" knowing its potential, yet certainly releasing too soon would make it look unprofessional and thrown together.
When something becomes stable enough for you to use it yourself without it crashing, and you don't know how to make it crash, you should release it (with lots of warnings). Fortunately, there are people who like to play with technology, and they will help you find its bugs while understanding that it is still experimental code. Each order of magnitude increase in the number of users will find as many bugs as the previous order of magnitude. After some number of years, if you are kind to your users and only do new features on a new branch, the stable branch will get to where months go by and there are no bug reports at all even though there are millions of users. It was this way with V3, and it will soon start to happen with V4.
8) Raising Awareness - by blinder
One question I always have with regards to successful (meaning funded, wide acceptance, large user/developer community etc.) is how did you raise the awareness of your project to get it from just a side project to something that it is today?
Did you use traditional PR techniques, or just through a community of connections?
There are two things that work, and not much else does:
Heinz wasn't able to get the time of day from the ext2 team, and he needed a resizer for LVM to happen. We said yes, sure, we'd be happy to do it. He introduced us to SuSE, and SuSE paid us $5k to write a resizer. That led to SuSE becoming a sponsor, and then once we were stable they made reiserfs the default filesystem.1) Word of mouth from users brings people from seeming nowhere.
2) Being open and eager to make a bit of effort to work with people makes you friends or at least allies.
Word of mouth from the users is the most powerful tool free software has. If you combine that with being open, willing to work with other people, and not being an exclusivist, things happen randomly out of nowhere that keep your business alive.
The only times I am not eager to work with people are when I think their technical direction is wrong (e.g. extended attributes), and this is sometimes the significant short term price paid for having clean design.
9) Rules of thumb - by realnowhereman
In your future visions paper early on you talk about Reiser's Rule of Thumb #2. However, I can't find Reiser's Rule of Thumb #1 -- what is it? Is it a secret? Does it contain the sum of all human knowledge?
I think it was this, and when I write a book in a few years, I think I had better go looking through some of my old versions of that paper, because it used to have a lot more design principles in it, and I cut them in some ill-conceived effort to reduce the length for some reason I no longer clearly remember. For the convenience of readers I have included the entire context of it.
10) On being one of those "outspoken" people - by salmoThe Little Inconveniences Dominate What We Do
Do the small inconveniences caused by fragmentation of name spaces really add up to something that should dominate the design concerns for the name spaces of an OS?
Information Diffusion Rule of Thumb: The extent of information diffusion is divided by the effort required to navigate the name space.
Information owners tend to think of the cost of access as only subtracting from the value of their information. but it does much worse, it divides it. The economics are little different from what they would be if money rather than time were the cost, and since we all know that halving the monetary cost of a silicon chip more than doubles its usage, it should seem reasonable to the reader that halving the time cost of a piece of information would double its usage. Three seconds rather than 1/3 second of access time means that the same information will spread to an order of magnitude fewer people, be used by them an order of magnitude fewer times, and be an order of magnitude less useful to the organization as a whole. A common mistake by authors of information is to not realize that most of the total utility of their piece of information will be felt by those to whom its utility is either rather small, or for which its value is speculative to the person considering accessing it. The other common mistake is to not realize or care how much harm will be caused by others expending the time cost of accessing their information only to find it irrelevant. Since we all have limited lifespans in which to do our research, time spent accessing rather than reading information detracts from our ability to wander speculatively after information that might be useful. The quality of the name space design determines these costs.
Example of the evils of name space fragmentation at work: Every employee must create a job description, and then store it in the dreaded Hyped Document Storage System for easy access by all. Mr. B. Bizy just enters HDSS, types Bizy duties, and it pops onto his screen inside a full blown Hyped Editor with lots of features. Easy. Unfortunately the only editor he knows how to use is emacs. Emacs doesn't know how to navigate or edit Hyped Indexes. He takes a moment to swear at the paucity of features possessed by emacs, and then he spends fifteen minutes paging through the Hyped manual. Finally he figures out how to put the document into a file. He edits it using emacs masterfully to achieve a new level of job description ambiguity, and puts it back in HDSS.
His boss comes by, sees him working on his job description, and tells him to compare the list of employees, as entered into the REGRES payroll relational database, to the list of those who have entered a job description into HDSS, to make sure everyone is complying with the "describe your job" directive recently issued. Unfortunately, HDSS is a keyword system and REGRES is a relational system. Neither HDSS nor REGRES can use each others' indexes, and while he could write a program to compare the output from the two applications, he can eyeball the output from the two faster than he could write the program. His eyeballs grow tired.
In general, whenever Mr. B. Bizy wants to act on information namable within one application and operate on it with another application, he must extract it from the first application into a file (at best a pipe), sometimes he must hand massage it into a form that the other application can enter into its own database, then he must put it into the second application, do his work, re-extract it, possibly massage it some more, and finally put it back into its original application indexes. It is never a thoughtless read or save, though it is always tedious. Mr. B. Bizy would prefer to spend his thoughts on other activities.
This is why most of the time the employees of Mr. Bizy's company store most of their data in flat files in the semantically impoverished filesystem: the greater connectivity pulls them there.
Mr. Reiser, first off I have no complaints about ReiserFS (which is a high compliment), I use it on almost all my machines, except a couple are running EXT3 because they're not heavily used and I'm lazy at times. But thats neither here nor there.
You fall into an interesting subcategory of project managers or whatever you want to call them. I'll call it the "outspoken genius" category (even though the first word might be understated and the last is probably hyperbole). Basicly your work is technically interesting, applicable, etc. That's a give in. But there are quite a few people who have personal issues with you and your manner and usually cite some exchange or another. Sometimes this is the basis of an argument to reject the use of your work, which I think is somewhat silly. You're not the only one, and certainly not the first to be interviewed here.
So what do you think about this? ie. Do you think you made interpersonal mistakes that landed you here or do you think you've been misunderstood? Does it bother you? Why do you think people enjoy egging on folks such as yourself and then citing the moment you get annoyed with them? Do you think this question ever has a prayer of being moderated higher than someone following the method of the previous question?
Jeeze, I realize I just wrote an essay question in the style of one of my old Philosophy professors. You know the kind, here's a statement now write some stuff (I guess I'll give you a few ideas of where to go).
I am not a genius, I am just never satisfied and very very persistent. I approach science like a blind man with a stick who is determined to fully understand what is going on. The difference between me and my competition is that I poke more than they do. I observe, find something to be unsatisfied with, try something to fix it, most of the time it fails and I try again. You don't see the failures because they don't get released. Why haven't other people already fixed the traditional balanced tree algorithms and made them effective enough for storing files? Because it was too much work, and they were smart enough to avoid the work, that is all. We simply rewrite more times and more deeply than others do, and that is how we get our results in our admittedly obscure field.
Now if you think about it, who wants to be around a blind man with a stick, someone who keeps insisting things aren't good enough and they need rewriting?
There is yet another way of looking at it though.
Linux is an ecosystem, and in this ecosystem there is fast growth vegetation and slow growth vegetation. The fast growth vegetation are the people who took what had already been done by Unix, and without changing its design they copied it while making coding improvements.
Then there are those who look at Unix, err, Linux, and see something just barely begun that needs a complete overhaul. These are the slow growth vegetation. Namesys is slow growth vegetation that got started a long time ago.
Now it is human nature that however a human being is, he is inclined to think that is the right way to be. There are those who think that design does not matter, and one should just make incremental coding improvements. There are also those who think that just coding without introducing fundamental new ideas is unimportant. Both of these sets of persons are fools. To say that one approach is better than another is like saying that grass is better than trees, or trees are better than grass. For Linux to prosper as an ecosystem it must have both.
Unfortunately the fast growth vegetation is actually developing a culture of exclusion, kind of like grass working to strangle the tree seedlings. Linux is developing more and more of an insider circle. Those who cannot code well enough to survive on the merits, must politick to exclude the threats....
A sad thing about this is that the most talented young security researcher I know doesn't want to develop for Linux because of the attitude of the inner circle to new people, and I can't really blame him, it is why I didn't develop ReiserFS for BSD back when BSD was....
Almost certainly he is not alone....
It is all very fine to discuss the sociology of herd formation, exclusion, and prejudice in the abstract, but one should never say that particular persons are making particular decisions on the basis of their herd instincts unless one wants to be truly hated by them and all of their numerous friends, and this was my mistake over and above the choice of what sub-herd to be part of.
I don't think anyone "eggs me on" though. I press released benchmarks of ReiserFS vs. ext3 the day ext3 was formally released at a conference, before ReiserFS had been included, and is it a surprise one of them was pissed at me? My competitors didn't and don't want ReiserFS in the kernel, and I wanted and want it in, and the result has all the dignity of a food fight. Filesystems that are less threatening nobody cares enough about to seek to exclude them. Many thanks to Linus, who chooses to allow healthy competition among the filesystems in his kernel.
If only the largest distro was so permissive....
You do all understand that while the GPL doesn't permit tying by license, distros have now moved to using threats of invalidating support contracts to achieve the market leverage they need to exclude competitors, yes? By doing this they can exclude mainstream official kernels from being used, exclude rival filesystems, exclude whatever might lead to less customer lockin.....
This is why you should try to avoid buying support contracts from distros and only buy support from those who agree to support you the customer doing whatever you choose to do, even if it is something fringe like using a kernel from Linus.;-)
They will tell you all this nonsense about how they can't support whatever software you choose to use. Buy better support from an independent and you won't hear this nonsense (www.Namesys.com/support.html is $25 a question and there are plenty of others). Most independents will support you using whatever distro you want, using whatever configuration you want, and they have the skill to cope with that. Sure, they will tell you that such and such gcc release on such and such distro was a lemon, or maybe even that the only reasonable fix for your bug is to upgrade to a recent release, but your support provider should never be telling you that you can only use what they sold you.
I am trying to convince the GSA that they should avoid procuring free software support that constrains the government's choice of what software to use, and they are at least considering the point of view. Why bother to have the GPL if you accept this loss of freedom?
Ummh, maybe these sorts of statements are why I am not so popular....;-) Well, glad to have answered your question!
We should all keep in mind though that there aren't any hard core greedy evil people in our industry. They are all basically good hearted people who chose trying to create a better society as their life's work at a substantial cost in personal income. Petty, bickering, overly impressed with ourselves, flaming, yes that describes most of us Linux kernel developers, but there isn't enough money floating around to attract any genuinely bad folks into our industry.