Forgot your password?
typodupeerror

2.6 Linux Kernel in Need of an Overhaul? 512

Posted by Zonk
from the get-it-right-this-time dept.
toadlife writes "ZDNet UK reports that Andrew Morton, the head maintainer of the Linux production kernel, is concerned about the amount of bugs in the 2.6 kernel. He is considering the possibility of dedicating an entire release cycle to fixing long standing bugs." From the article: "One problem is that few developers are motivated to work on bugs, according to Morton. This is particularly a problem for bugs that affect old computers or peripherals, as kernel developers working for corporations don't tend to care about out-of-date hardware, he said. Nowadays, many kernel developers are employed by IT companies, such as hardware manufacturers, which can cause problems as they can mainly be motivated by self-interest."
This discussion has been archived. No new comments can be posted.

2.6 Linux Kernel in Need of an Overhaul?

Comments Filter:
  • by eldavojohn (898314) * <eldavojohn.gmail@com> on Saturday May 06, 2006 @08:58AM (#15276440) Journal
    A lot of times, the old debate of Windows Vs Linux covers how often the OS fails miserably. Yes, we all know the famous "blue screen of death" and I think that that single concept connected with Windows makes it unappealing. I believe that Linux has the ability to handle internal errors more elegantly but that's only because I've only seen it fail from hardware errors. Granted, I don't know enough about the inner workings of Windows or Linux but let's face it, Win95 & Win98 first editions would crash if you looked at them wrong.

    Here's a possible horror story:

    While the debate rages on, Linux gets more complex. Linux gains more bugs. Linux begins to aim for more end-user features. Developers get sick of maintaining other developers code and focus on making new features (asked for or un-asked for) because it gives them pride to make something new. The Linux kernel hits the same pitfalls as the Windows kernel.

    If it takes an entire developement cycle to simply improve the current version's bugs, I'd gladly accept and encourage that.
    • by MOtisBeard (693145) <atomdebris@@@gmail...com> on Saturday May 06, 2006 @09:12AM (#15276479)
      "If it takes an entire developement cycle to simply improve the current version's bugs, I'd gladly accept and encourage that."

      Hear, hear. The real pitfall for any technical production process, from software to space shuttles, is the ascendancy of a businesslike concern with the product's image to the point that it begins to dictate release deadlines. It's all well and good to worry about image, but when that worry becomes such a focus that it dictates the way that technical work gets handled, suddenly your product or process has become an example of form over function... and unless your product is tuxedoes for corpses or something similar, SCREW form over function!

    • If a free software developer doesn't want to do something, like fix old bugs, they won't. If this is made to be the only way they can contribute, they probably won't contribute. It's better to get some shiny, unasked for features than nothing at all in my opinion, even if it is not as good as added stability.
      • by eldavojohn (898314) * <eldavojohn.gmail@com> on Saturday May 06, 2006 @09:24AM (#15276509) Journal
        Man, it's crazy but we have this thing where I work. Uh, what do you call those things again?

        They are very good at convincing people to do things regardless of what they get out of it ... I think they're called 'leaders.'

        If Andrew Morton doesn't have leadership skills, I suggest he step down and let another manager step up.

        If I were in his position, I'd get everyone who's even mildly important in a room (or, failing that, an e-mail) and:

        "Guys, remember back to the reason you first joined in the contribution to develop a free operating system. Now, think of all the hard work you've put into it and other people have put into it. Now, that's all in jeopardy and here's why..."

        Spend some time reasoning with them and pointing out the bugs that are really really hurting the kernel. In the end, wrap up with:

        "Look, I know this sucks and you're going to have to tangle with a lot of bugs that aren't even your own. But what have got if we haven't got a stable operating system? We've got another Windows, that's what. You just don't have to pay for our piece of malware. Just see this one development cycle through, I promise we'll make it as quick and painless as possible and after all is said and done, we'll have another meeting like this were anyone can suggest any crazy-ass feature they want to add. Once we pick out what we want, we'll spend the next development cycle letting our imaginations run wild. We'll make a kernel so unstable that the user'll have to re-flash their BIOS when it crashes! Then maybe we'll work on solidifying that. Right now, we just owe it to ourselves and our fans to give them something that's 100% stable and reliable."

        If you can't reason with them like that, maybe you just have to accept they can't be persuaded and let them do what they want but prune their work if it detracts from your goal end system.
        • by chrismear (535657) on Saturday May 06, 2006 @09:38AM (#15276559) Homepage

          Man, it's crazy but we have this thing where I work. Uh, what do you call those things again?

          Paychecks?

        • Mr Moreton is a very, very wise and informed gentleman, and "leadership" skills aren't the only useful ones -- in fact, they can easily become crippling handicaps as every rational response is knifed in favour of a justifiable "leadership response", effective or not.

          If you see a need for a leadership character, please engage them in addition rather than in place, else Linux will overall lose even if a relative genius is so employed. There is much comment in this post WRT Linux vs Microsoft development model
      • I don't think that's universally true. Maybe for some people, yes, but lots of free software developers are very proud of their creations, and will strive to make their products as bug-free as possible. Also, the kernel development has leadership, and if they say "Look, features are all very good, but we have so many bugs here that we just have to get them fixed before we can add anything else", then people who want to get their feature included as soon as possible will work to get rid of the bugs so additi
      • by Anonymous Coward
        It's better to get some shiny, unasked for features than nothing at all in my opinion, even if it is not as good as added stability.

        That had better be sarcasm. Slashdot attempts to "sell me" Linux based on its lack of bugs and the many hands theory of bugfixing. If Linux doesn't have that, then it's useless. Linux certainly isn't as easy to use as Windows, and OpenOffice certainly isn't as good as Microsoft Office. If the community can't provide a product that is at least less flawed than Microsoft's, t
    • If it takes an entire developement cycle to simply improve the current version's bugs, I'd gladly accept and encourage that.

      This is just treating the symptons rather than the cause: bad development with focus on features and performance instead of quality and correct code.

      • by TheRaven64 (641858) on Saturday May 06, 2006 @09:31AM (#15276529) Journal
        Linux got me using *NIX. BSD showed me how *NIX is meant to work. I currently use OpenBSD and FreeBSD, and this is exactly the kind of reason why I switched.

        In FreeBSD, there are three branches, -STABLE, -CURRENT, and -RELEASE. Any new features are put into -CURRENT. Here, they undergo testing. The only people who should be running -CURRENT are those who are developing or actively bug-hunting. Once a feature is stabilised, it migrates into -STABLE. Here, it receives more general testing. A lot of people use -STABLE, and file bug reports. Finally, a -RELEASE branch is created from -STABLE. This undergoes even more testing and is then shipped (usually after several betas and RCs). The -RELEASE branch is maintained in the tree, but only bug fixes are allowed to go in it. If you want a stable system, you stick with a -RELEASE branch. For a slightly less-critical system, you might want -STABLE for the features (my ThinkPad runs -STABLE, and I have never yet had it crash).

        The direction of the OS development is driven by the core team. These are elected annually by the developers.

        In the OpenBSD world, there is a code review process. Every piece of code in the base system is audited on a regular basis. When a new category of bug is discovered (e.g. the multiply overflow that caused a security hole in OpenSSH), the entire source tree is searched for occurrences of that bug. These are then fixed.

        Both of these development processes give high-quality, stable systems.

    • by gmack (197796) <{ten.erifrenni} {ta} {kcamg}> on Saturday May 06, 2006 @09:14AM (#15276484) Homepage Journal
      As someone who is resbposible for many of those bug reports I can tell you it's not the fetures that break things. It's things like driver API cleanups that don't get all of the older drivers.

      The result is that if you have reasonably common hardware the kernel is getting much more stable but for things like my non PCI sparc(compile problem with some options) or my 21 ethernet port firewall (needs special options to boot or it crashes) it has gotten more buggy.

      I'm not sure a freeze will do much to fix it as a large part of the problem is that all these somewhat rare things need testing.

      I still find these things get fixed rather quickly when I report them even without the freeze.
    • by Rosco P. Coltrane (209368) on Saturday May 06, 2006 @09:15AM (#15276489)
      Yes, we all know the famous "blue screen of death" and I think that that single concept connected with Windows makes it unappealing. [...] Win95 & Win98 first editions would crash if you looked at them wrong.

      Er.. I hate Windows as much as the next guy, but really, when was the last time you saw Windows bluescreen? Perhaps you could make your point by comparing Windows and Linux versions that aren't 11 years apart.

      I believe that Linux has the ability to handle internal errors more elegantly but that's only because I've only seen it fail from hardware errors.

      Yes but it handles hardware errors gracefully too: for example, one of my 24/7 machines's hard-disk died last week. I came back and found out that I couldn't write anything to it at. A quick look at the console showed a message saying "root filesystem, too many errors, remounting read-only" or something like that. The result is that data corruption was minimal *AND* the machine didn't hang. How's that for graceful? You wouldn't dream of having that in Windows.
      • by Anonymous Coward
        The last time I saw windows repeatedly bluescreen due to a software error was Word XP on Windows 2000:

        Insert floppy #1 with foo.doc
        Open my computer
        Double click foo.doc to open it in word.
        Remove floppy #1, insert floppy #2.
        Press save.

        What happens next:
        1) Nearly every single time, the original foo.doc file on floppy #1 will be rendered unreadable. (This actually happened the moment they pulled floppy #1 out)
        2) The vast majority of the time the computer will BSOD
        3) Occasionally, the computer will corrupt flopp
      • I had an XP SP2 bluescreen a few times in the past few months. I wasn't quite sure what caused it any time.
      • when was the last time you saw Windows bluescreen

        I use Windows once in a blue moon. The last time was last week. New landlady's Win 2K, she turned it on, we waited for it to finish booting, she closed her MS chat client, I clicked on control panel, and it froze. I waited several minutes, gave it the three finger salute, got a blue screen and it was totally hung. Had to hard reset.

        Not XP I admit, but it's a pretty sad state of affairs when control panel hangs a freshly booted system.

      • but really, when was the last time you saw Windows bluescreen?

        May 4th.

        It wasn't on server 2003 - it was on the recent home computer OS that Microsoft released afer win2k, but it was definitely a blue screen.

        As for the linux side - sometimes gnome revisites the windows lockups, paticularly running remote applications on X can hang up the entire session if something goes wrong. A single user non-network aware environment doesn't belong on *nix but unfortuately gnome started that way and hasn't entirely outg

      • really, when was the last time you saw Windows bluescreen?

        Let me see... The last time I remember was October last year, when I got my Philips 200W6 monitor and tried to install it in XP. I went through ten or so blue screens, after following the instructions in the included CD. Having no success, I did a few driver downloads, the closest I could get was a 1600x1024 resolution. I gave up and never tried to boot in XP anymore.

        In the Linux side, it took me about two minutes to add "1680x1050" in /etc/X11/xorg

        • I can't tell you how many times the exact opposite has happened to me. I have had to spend hours getting X to work with the hardware that is on my machine at the time. Some hardware is easier than others, but you have to admit, getting video up and running is much easier in Windows than Linux.
      • I hate Windows as much as the next guy, but really, when was the last time you saw Windows bluescreen?

        I got called LAST NIGHT to help with a BSOD in Windows XP for a family member that would happen 20-30% of the within a few minutes of booting. I just had them copy their "My Documents" folder to a DVD and then reinstall. Problem seems to have been solved...
      • I see a XP blue screen every day on my machine. I have tested the hardware and it's perfect. If only happens when I run a particular software over an extented period of time. I decline to name it, but its not a display driver or such and its not even a resource hog. It basically writes to te registry a LOT and the registry size becomes 200+MB. After which I might see an error or find the blue screen.
        In Windows there are NO logs, no clues, NOTHING to indicate what the problem might be. You're completely bl
        • by Tim C (15259) on Saturday May 06, 2006 @01:41PM (#15277640)
          In Windows there are NO logs, no clues, NOTHING to indicate what the problem might be.

          When Windows crashes, it writes an error log and a memory dump to the disk. Under XP, check the Startup and Recovery settings (My Computer -> Properties -> Advanced -> Startup and Recovery settings)

          Also, Windows logs a lot of information to the event logs. The event log viewer is in the Administrative Tools. By default, when Windows crashes, it logs information about the crash there (in the System log).

          No offence, but given that you appear not to know about the crash dump or event logs suggests that either you don't know enough to correctly diagnose the problem, or you're running one of the 9x series of Windows, in which case frankly your only sensbile option is to throw it away and install something based on NT; 9x is a joke.
      • Bluescreen? Sometime last year, don't recall why.

        Spontaneous reboot? A couple of weeks ago.

        Inexplicably corrupt registry after running Windows Update? Last week.

        Throttling the CPU down to 600MHz and then reporting that as the maximum speed? A couple of days ago.
    • If it takes an entire developement cycle to simply improve the current version's bugs, I'd gladly accept and encourage that.

      I don't think one mere development cycle is going to be enough. Code improvement is a continuous process. The Linux kernel programmers could (and should) learn a lot from how the OpenBSD team works [openbsd.org].

      I've written a Linux kernel driver in the 2.2 days, and at least back then the kernel source was rather messy (I've heard it's been much improved since then). One problem the Linux kernel ha

      • What would really help the code quality of the Linux kernel is to start refactoring subsystem code and throwing out the old stuff that oughtn't be used anymore. Less code means less space to hide bugs in.

        Wow! what an insight.. You just desciribed exactly what is happening. The problem is that in this refactoringy device drivers develop bugs that need to be fixed. The problem is that the kernel developers can't possibly test every driver and now they have to wait for bug reports.

    • by tacocat (527354) <tallison1NO@SPAMtwmi.rr.com> on Saturday May 06, 2006 @10:44AM (#15276853)

      I used to work for a guy who very nicely summed up software development as: Ready, Shoot, Aim. This is similar to the notion of most hobbyists with the time will eventually rewrite something they get working because now that they understand what it is they really want to do, they can get it right the second time.

      Seems to me that the olde school development model of the Linux Kernel had a valid point of doing an odd numbered release (2.5) for feature development and then an even numbered release (2.6) for refinements before we start the next wave of features (2.7). I was a little dismayed that the decision was made to drop this practice as it seemed to be one of the most intelligent things I've heard of in software development.

      I'm getting ahead of myself here, but I do hope that the commercial investment in the Linux Kernel doesn't start pressing the Linux Kernel to be developed in the same manner as commercial OS. I'm not talking specifically about Microsoft, but any company software development project. They tend to go for the features before they spend the time to fix anything. And then it's an uphill battle to get anything fixed. If the Linux Kernel development takes the same path then there shouldn't be any surprises if they start to fall down and hurt themselves more frequently. And eventually, Linux will be surpassed by someone who has the better practices in development then what they have adopted.

      I don't expect people to get it right the first time, but I would appreciate it if people would get it right rather than just ignoring it. If it becomes too difficult to get it right, then they need to establish a sunset limit on the age of hardware they are willing to support, or simply not allow any (buggy) support to enter in the first place. As extremes they probably don't want to keep supporting MCA buss but it might be a bit much to drop EIDE, ISA, COM, and PS2 and only work on FireWire, SATA, SAS, USB2.0 support.

  • Its about time this was recognized by the Linux developers. Every time I've tried to upgrade from 2.4.26 over the past few years, my system has become unstable and I've ended up reverting. Hopefully I'll be able to upgrade at last.
    • Re:About time (Score:4, Interesting)

      by Blue Booger (223698) on Saturday May 06, 2006 @09:19AM (#15276496)
      Agreed. I have been forced to upgrade to 2.6 on a few computers for features that I needed that are only in the 2.6 series, but everytime it has been a problem. All of our production machines are still built with 2.4 and we purposely use hardware that is supported by the 2.4 series.

      Linux has caused Microsoft to improve their products, and I have found myself removing Linux servers to replace them with Windows 2003 Server of late. On the desktop, it is not even close. I sit next to a guy who runs 2.6 on his Ubuntu machine and I laugh everytime he has to reboot. My Windows XP box only goes down rarely for updates and it does it at night when I am not there. Last time, I had over 100 days of uptime (this is a desktop machine). I rarely ever see the BSOD anymore and if I do it is almost always caused by a hardware problem. That is what I *USED* to be able to count on with Linux - if it crashed, there was a hardware issue. Now, with 2.6, I've lost that.

      There are coworkers of mine who would have fainted three years ago if they heard me say something like this, but Linux just isn't the lean, reliable operating system it used to be.

      • Re:About time (Score:3, Interesting)

        by Homology (639438)
        My Windows XP box only goes down rarely for updates and it does it at night when I am not there. Last time, I had over 100 days of uptime (this is a desktop machine).

        You don't do Windows Update very often, do you?

        There are coworkers of mine who would have fainted three years ago if they heard me say something like this, but Linux just isn't the lean, reliable operating system it used to be.

        Use something that cares more about quality than new features, like the *BSD.

  • Duh Factor (Score:5, Insightful)

    by Spazmania (174582) on Saturday May 06, 2006 @09:10AM (#15276469) Homepage
    One problem is that few developers are motivated to work on bugs

    Yeah, this is one for the "no shit sherlock" column. What did you expect to happen when you eliminated the stable/unstable cycle? At a minimum the individual parts of the kernel would achieve stability at different times so that the kernel as a whole was never stable.

    This frustrates me immensely at work. I hung on to 2.4 as long as I could. Hardware compatibility pushed me to 2.6 and it just isn't as reliable.
  • by borgheron (172546) on Saturday May 06, 2006 @09:10AM (#15276470) Homepage Journal
    This may look like flamebait, but I'm actually serious. Microkernels are more reliable because of drivers running on userspace. If a driver crashes, it can't take down the whole system. Also, given that some microkernels are only about 3500-6000 lines of code (as opposed to Linux's million or so) it's relatively easy to make certain that the code is bug free (given that the average number of bugs is 16 bugs per 1000 lines of code according to some recent studies).

    So, if the kernel needs an overhaul, the why not do it right this time? Now some may say that microkernels have a performance hit, but todays machines are certainly fast enough to render any performance hit negligible.

    GJC
    • Or use Darwin/x86. It's a microkernel that's already ready for prime time.
    • The crux of this article/interview is Linux 2.6 needs improving with old and/or less common hardware rigs. Your proposed extended solution is to radically refactor the entire kernel because, hey, modern hardware that the majority of us have can cope with it.

      How the smeg did you pull that off, get modded insightful, and not get modded somewhat offtopic?

      In any case moving to a new fundamental architecture is like turning grape juice into wine. There are some good wines and some bad wines, and while it's certa
    • How about just moving the most userspace-friendly bits to userspace? FUSE has allowed the development of a ton of kernel-level features (eg, read and write wikipedia entries using any program you like, by editing .txt files in ~/wiki/), while leaving the kernel itself as stable as ever; and without needing a complete rewrite.
    • Ahhh, but a good developer doesn't use increased CPU speed as an excuse to write slow code. Ideally; the faster the CPU, the faster the code runs. NOT: The faster the CPU, the "less slow" the code runs. In terms of CPU-hungry code, whatever "CPU-hungry" is defined as, depends on the task at hand.
    • Rewrite it as a microkernel
      That's silly because it would mean starting again very close to the begining - but what wouldn't be silly is to write a microkernel as a new system.
      todays machines are certainly fast enough to render any performance hit negligible.
      Many scientific applications run for days or weeks on clusters - in which case performance hits could add up to times that are not trivial.
    • MkLinux has been around for years. I once ran it on a PowerMac 6100 until the monolithic kernel was ported to the architecture with driver support for the hardware in the machine.

      Granted, the hardware was older, but the performance hit was massive compared to the monolithic kernel that followed. I'd hate to think of losing that many cycles regardless of the speed of my CPU.
    • by diegocgteleline.es (653730) on Saturday May 06, 2006 @11:27AM (#15277026)
      given that some microkernels are only about 3500-6000 lines of code (as opposed to Linux's million or so)

      Oh yes, but the microkernel doesn't implements almost any user-visible functionality - TCP/IP stack, VFS, filesystems, USB, random devices....

      You know, the linux core kernel is also quite stable. They're the drivers who hit more bugs. A microkernel itself can be perfect, but the userspace daemons implementing funcionality will also have bugs, and those daemons will take more or less the million of lines that linux takes. IOW: microkernels doesn't fix magically bugs.
    • No, just use nooks (Score:5, Insightful)

      by meese (9260) on Saturday May 06, 2006 @11:28AM (#15277029)
      Or you could use nooks [washington.edu]. Nooks will protect the OS from driver crashes and restart failed drivers transparently.
    • by g2devi (898503) on Saturday May 06, 2006 @12:11PM (#15277228)
      A few things to consider:

      * Remember what happened when Netscape 4.7 decided to do a complete rewrite instead of incremental improvements over a longer period of time? Netscape went from 90% market share to 1%. A complete rewrite would be just as damaging to Linux.

      * Bugs are bugs, no matter where they are. Most of Linux's "million lines of code" are drivers. If no-one is doing bug fixes in the kernel drivers, moving them out of the kernel wouldn't help.

      * Linux *has* moved most things to modules and the core is pretty well understood and not a likely source of bugs because it has the most eyeballs on it. So the added modularity of Microkernels wouldn't buy you anything.

      * Linux scales from super computers to the Linux watch using the same code base. Supercomputers might not care about the added microkernel layer but low resource environments definitely *would*.

      * Buffer overflows are generally not the reason most well designed kernels go down, it's hidden race conditions, starvation, and other NP complete problems that go hidden for years. Moving these problems to user space wouldn't solve them. In fact, it may aggravate them unless intimate knowledge of global state is available to user space (which is in itself a security risk and thus a source of bugs)
      • Remember what happened when Netscape 4.7 decided to do a complete rewrite instead of incremental improvements over a longer period of time? Netscape went from 90% market share to 1%. A complete rewrite would be just as damaging to Linux.

        Why would that magically happen to Linux just because Netscape died? Netscape didn't lose its marketshare because of a rewrite. The rewrite happened after it had already lost to IE. And besides that, it didn't go from "90% market share to 1%." You're totally pulling numb
    • Yes, yes, "borgheron", we all know its you Mr Tanenbaum, you don't need to hide behind a silly moniker. You're not *still* pissed at Torvalds after all these years are you? :-)

  • Drawing the line (Score:4, Insightful)

    by Digital Dharma (673185) <max@zTEAenplatypus.com minus caffeine> on Saturday May 06, 2006 @09:11AM (#15276472)
    I think at some point you need to draw the line regarding support for older hardware and peripherals. I mean, excessive backwards compatability has retarded advancement of the industry IMHO.
    • I think at some point you need to draw the line regarding support for older hardware and peripherals. I mean, excessive backwards compatibility has retarded advancement of the industry IMHO.

      In free software, the line is drawn when needed and it never retards anything. ALSA, for example, has OSS driver support so lots of really crusty old sound cards still work and work well. That has not kept people from making or working on newer cards. Old free binaries that no one maintained work about as well or bet

  • Anyone whos been playing with the newest kernels might of noticed that the option to compile drivers not expected to compile cleanly has been removed from the kernel

    This is from memory, hehe i think thats hte right option.

    A friend and I noticed this and at first thought it was a bug in the kernel (2.6.16?) but appently linus has "hidden" it so that only the devs can use it as he belives they need cleaning up more.

    Im wondering if the two are connected...
  • by udippel (562132) on Saturday May 06, 2006 @09:21AM (#15276500)
    So, there are two relevant aspects to it. Probably more.
    The 2.6 Kernel has been plagued by bad bugs. On the other hand, one way or another you need it for a multimedia-enabled desktop on more modern hardware (compared to 2.4). From that point of view, the proposal is fantastic. Otherwise we see the quality of the kernel of our beloved OS going down.
    2.6 has never seen a phase of consolidation, really. Therefore, the proposal is almost overdue.

    It would be badly short-sighted to think of quick ROI (as the IT companies usually aspire), since the troubles only multiply with further advances.

    Yes, please, Andrew, get stability back into 2.6 - Though I have no single word of say in this, I thrust up both hands in favour !

    Maybe there are some thumb-screws needed for the contributors: As long as the bug level stands above a certain threshold, no enhancements will be accepted.

    There is also a political aspect to it: we have always argued about re-use of legacy hardware. This becomes even more important with Vista on the horizon. The kernel must not lose the 'caring' attitude. It must be trustworthy and trusted by the general public to care for more than greedy hardware manufacturers and their sick quest to replace functional hardware with most recent hardware.

  • by sammyo (166904)
    "Developers motivated by self interest"...? Isn't it
    amazing what radical subversive thought can slip
    though the open source ff (man -k filosofy filter).

    What is the need for backwards compatibility anyway?
    The Dosification of Linux?

    Anyway, why not have a rarely updated, minimal branch
    for ancient hardware, like anything over 3 years old?

  • by nblender (741424) on Saturday May 06, 2006 @09:27AM (#15276519)
    Linux got off the ground and started incorporating everything anyone contributed... grabbing features and drivers like there was no tomorrow. NetBSD was rejecting stuff because it wasn't written right. So it took ages for NetBSD to get audio until someone did it right; while everyone else went with OSS. Over and Over this happened. NetBSD was criticized for being useless because it didn't support all the stuff Linux/FreeBSD did.

    Nice house. Did you build it yourself?

  • by Anonymous Coward
    In a way Linux as a whole (the kernel) is now suffering from the same problems as Debian stable once was, at least from my perspective. Do you guys remember the previous Debian stable? It remained stable for such a long time that eventually you simply needed websites like Backports [backports.org] to be able and run some current software since everything included with Debian was way ancient. Naturally you could run Unstable but it wasn't exactly the best approach for servers. I eventually ended up running Testing and keepi
  • by Opportunist (166417) on Saturday May 06, 2006 @09:35AM (#15276548)
    Of course it's more rewarding to create a new feature. First of all, no coder enjoys working on foreign code. It just doesn't "look right", doesn't "feel right", simply because everyone has his own style.

    And don't forget bragging rights. Hey, I invented some feature. Sure, some guy debugged it, but I get to slap the label to it. I might even name it after me (Hello Mr. Reiser, if you should read this...). The guy who debugs it gets ... zip.

    This has to change first if we want people to put in time to hack through other people's code. Appreciate the work done to get it fixed. After all, appreciation, bragging rights and "making a name" is everything you get from writing free software.

    Few people do it out of generosity or because it "feels good". They want to be known. Linus might not have gotten much out of writing that Kernel, but he sure as hell has a killer paying job now. I doubt the people who wrote the original implementation of iptables/ipchains are worse off. But the debuggers? Lot of work, no name.

    Pull the debuggers in front of the curtain, and you'll see people debug. If we only appreciate the people who wrote a feature in the first place, even if that feature doesn't work 100%, we won't see people debug.
    • Of course it's more rewarding to create a new feature. First of all, no coder enjoys working on foreign code. It just doesn't "look right", doesn't "feel right", simply because everyone has his own style.

      Of course, every developer has their own way of doing things, but an appropiate coding standard helps to read the code.

      Few people do it out of generosity or because it "feels good".

      Oh boy are you wrong, but then this must be alien concepts to you.

    • by martyb (196687) on Saturday May 06, 2006 @10:10AM (#15276698)
      Pull the debuggers in front of the curtain, and you'll see people debug. If we only appreciate the people who wrote a feature in the first place, even if that feature doesn't work 100%, we won't see people debug.

      Here Here! Seti at home had a gazillon(tm) people contributing cycles to the effort (many times in teams) to see who could place highest on the list of contributors.

      How about a BFoD - Best Fix Of the Day? Each day, post the name of the submitter and some details about the item debugged and fixed:

      1. Name Recognition Not just to see your name in "lights", but also gain something you could add to your resume.
      2. New Code - preference to bug fixers Make a policy that you will give top priority to bug fixes... if you attach your new feature to a bug fix, it will get preferential treatment. Those without a bug fix fall to the bottom of the queue.
      3. Share / Educate Share debugging techniques and tools. Make it easier to fix bugs by sharing best practices with the community.
      4. Scratch an Itch It may not be fun, but if you develop new code, you also get to spend time debugging... learning from the preceding item will speed the development process and you'll be able to complete your Next New Thing(tm) even faster and better!
      5. Competition Have contests for the Best Fix of the Month (BFoM) and Best Fix Of the Year (BFoY). To be chosen from the winners of the BFoDs.

      This could be further improved by posting a Bug Of the Day (BoD) where there is a daily bug that is to be fixed. The first fixer gets recognized as well as anyone who provides an especially elegant solution. Award bonus points for fixing related bugs in the area so as to promote more complete fixing in that area.

      Post these prominently for all to see and I'd be willing to bet that there would be a groundswell of support.

      This is just off the top of my head - please post any suggestions for enhancements or (gasp) any problems you see in it!

      • by Opportunist (166417) on Saturday May 06, 2006 @10:34AM (#15276812)
        Add a "highscore list" and it's already hitting home.

        No, don't mod me funny. I mean it. Make it a page every halfway important person in the OS-community wants to read, make it the place to go looking of you're headhunting for a person with fixing skills.

        Today, you rarely if ever get to start a new project. Most of the time, you're hired for a project that's been running for ages. And there, you don't need a coder who can pull fast algos out of his rear, you need people who can deal with alien code, understand it quickly and debug it. And there you'd have those people, listed. The top debuggers of the world.

        Just make sure HR gets to read it and they know their applicant list.
  • by Shivetya (243324) on Saturday May 06, 2006 @09:39AM (#15276563) Homepage Journal
    The painful truth is that very few developers, in open source or otherwise, like fixing old code or old bugs. This is very true if the bug fix isn't going to be noticed by a great number of people. Face it, most of us like to write new code or improve on something that isn't working the way we want it even if it is working right.

    This is what separates professional developers from the rest. We work on it regardless of how much it benefits us. We might gripe a bit but in the end we do what is asked. Sure that backend has flaws and is going to be replaced down the road but it does not excuse us from making it work now.

    When you go look at some of the bugs listed in even current applications you start to see the age some have accrued. Some are rightly passed over as 1 in a million occurences but too many are skipped because it just doesn't have any allure. Note, I am not singling out people who work on Open Source, I am pointing out that the article fails to touch an area that exist but most don't want to acknowledge.
  • by njdj (458173) on Saturday May 06, 2006 @09:41AM (#15276567)
    entire release cycle to fixing long standing bugs

    Yes, it's a good idea.

    But don't waste time on bugs that only affect legacy hardware.

    It would also be a good idea for some effort to be spent on consolidating, corrrecting, and updating the various lists of "Hardware supported by Linux". There are lots of such lists on the web, for example:

    - not to mention the distro-specific compatible hardware lists maintained for Redhat [redhat.com], Mandriva [mandriva.com],Suse [hardwaredb.suse.de], and others.

    We need one correct, maintained list, not dozens of nearly-correct, usually out-of-date lists. And it seems to me that the list should depend only on the kernel version, not on the distro.

  • by prestwich (123353) on Saturday May 06, 2006 @09:50AM (#15276596) Homepage
    My experience is that stability is dropping, even on modern hardware. You can no longer take the latest '2.6' stable kernel and expect it to keep your server running stably.

    Now, you can take a Redhat or SuSE packaged kernel and find those are pretty stable.
    But there is a problem; if you report a bug in a Redhat/SuSE kernel on the lk.ml you get a
    'that's Redhat/SuSE problem - speak to them'.

    As the 2.6.x stable tree becomes less stable, less people use them on production servers and instead
    use packaged kernels. As less people use them, they get tested less - and less bugs are actually reported for them.

    It is also not just a case of old hardware; in the last few kernels I've had leaks that make
    a simple firewall die repeatedly after a few months, I've got a machine with a slow RAM kernel leak
    that makes a simple DHCP server fall over every few months, and I've had a 2.6.1x kernel that couldn't
    run an NFS server for 24 hours without falling over.

    It ain't nice - but these are my experiences.

    Dave
  • But it runs Faster!! (Score:5, Interesting)

    by giorgosts (920092) on Saturday May 06, 2006 @09:50AM (#15276598)
    I follow Ubuntu with the latest kernel updates and I tell you with every update performance increases.. .When I booted Windows I used to feel the difference, but not anymore. I think the quality of the kernel is fine. There other people that need to improve in quality, e.g. the rest of the free apps, esp packagers who have to make the thing to just work.. What will I do with stability if nothing works? Am I going to just look at the computer while its all stable doing nothing?
  • by vijayiyer (728590) on Saturday May 06, 2006 @09:50AM (#15276600)
    Some of the above posts say "I don't notice any problems". I'm guessing some of the bugs nobody has fixed are somewhat obscure. There is a well known bug when Linux mounts large XFS file systems via NFS that bothered me regularly - large directories could not be searched, deleted, etc. Now I have a Mac working with that flawlessly. These are the types of bugs - annoying, but non-fatal - that few people want to fix.
  • by Zarhan (415465) on Saturday May 06, 2006 @09:55AM (#15276624)
    At least so I thought, ie. once 2.6.17 is out, there will be a separate branch based on 2.6.16 (2.6.16.y, continuing past the current series) that would constitute as a "stable" branch where no new features would be added, and focus would be on fixing bugs and stability..

    So, is the problem already solved?
  • Overhaulin' (Score:2, Funny)

    by FrankDrebin (238464)
    Day 1 - Andrew's kernel is "stolen" under strange circumstances.
    Day 2 - Chip Foose draws up a design for the kernel, the coding commences.
    Day 3 - Andrew receives a call from a "kernel repo man". Apparently his kernel was taken by mistake.
    Day 4 - Coding like mad. Fake repo man stalls Andrew. The crew is running out of Doritos.
    Day 5 - The pressure mounts. Token pretty face "helps out" by typing "make" in a staged moment of tension.
    Day 6 - We're never gonna get the kernel done in time!!!! Oops and pa
  • A big part of what made OSS get off the ground outside of its core area was the belief that it'd lead to better bug fixes, delivered faster because of all of the eyeballs looking at the code. Well, that's a little hard to argue if 90% of those eyeballs are dedicated to looking at new things, not fixing outstanding issues.
  • Whew! (Score:5, Funny)

    by cciRRus (889392) on Saturday May 06, 2006 @10:16AM (#15276721)
    So, there are lots of bugs in Linux! Good thing I'm using Windows.
  • by Anonymous Coward
    The Linux Maintenance team lead was quoted in a recent CNET article that he htought 2.6 was getting buggier. What was even more disturbing was that he based this on an impression of more bug reports coming in, that he didn't have stats.

    No Stats!!!

    How can you manage a project like Linux and not have a system with solid bug stats for tracking trends. Its wasy to realease software on schedule if you are not tracking the quality of the release.

    A software project without regular bug trend reports is a diaster
  • by buddyglass (925859) on Saturday May 06, 2006 @10:33AM (#15276800)
    As an application developer, it really irks me that I have to release software that I *know* has bugs, choosing instead to complete whatever features were supposed to be in the release. As a consumer of applications, sometimes I wish that instead of adding all the new wizbang stuff, someone would devote an entire release to fixing *all* known bugs and improving performance. Maybe this will finally happen w/ the kernel.
  • by Stalyn (662) on Saturday May 06, 2006 @11:01AM (#15276933) Homepage Journal
    We used to have two trees being worked on concurrently. Where (x.y.z) if y was even it was the stable branch (just bug fixes and occasional new code for important hardware) and if y was odd is was the unstable/development branch. It might be a good idea to return to this development process.

    Actually I'm in favor of just forking the unstable/development Linux tree into seperate trees maintained by different people. This is somewhat being done now as huge patchsets. Have Linus work on the super-tree or the official tree. Then maybe have Alan Cox have his own tree, Morton has his own tree, Kolivas have his own tree.. and etc. With git [wikipedia.org] this is actually pretty easy.

  • by chri1753 (854560) on Saturday May 06, 2006 @12:00PM (#15277186)
    C doesn't offer enough abstraction to deal with new levels of complexity. It is now far from the best language available for systems programming: bitc and prescheme are especially worth looking at if you haven't heard of them.
    • by TheRaven64 (641858) on Saturday May 06, 2006 @01:32PM (#15277591) Journal
      While I agree in general (C is not a good language if you are programming something that's not a PDP-11), it is possible to get a lot of abstraction using C. Compare, for example, Linux and Dragonfly BSD. In Linux, they use an explicit threading / locking model. The developer has to make sure that they acquire and release all of the locks they need, and in the right order. In DragonFly, they use a message-passing model. Once they have the message-passing semantics working once, they can keep re-applying it. It is then much easier to reason about the code (using tools such as CSP) and to prove that it is deadlock-free. Simple things, like non-branching sequences of processes that don't send messages backwards can be implemented in a way that guarantees that there are no corner cases that will break it. Consider the following operations involved with sending some data over a TCP link (a rather contrived example):
      1. Split the buffer into segments that can fit in a packet.
      2. Add TCP headers.
      3. Add IP headers.
      4. Add Ethernet headers
      5. Send to the hardware's output buffer
      On DragonFly, each of these steps could be performed by a separate thread, passing the processed buffer between them, and if it worked at all then you could guarantee that it was correct. On Linux, the amount of locking required would mean that a system this complex would be likely to create locking bugs. Looking through the Linux bug database, it seems that a significant number of bugs are currently lock-related.
  • by Wolfier (94144) on Saturday May 06, 2006 @12:46PM (#15277396)
    If you put resources into making the newest kernel compatible with old peripherals that resource could not be used for bugfixes and new features.

    The new kernel probably will not bring anything new to the old hardware, either.  So why don't just use the stable 2.4 kernel with security patches?
  • by Tracy Reed (3563) <treed&ultraviolet,org> on Saturday May 06, 2006 @01:35PM (#15277608) Homepage
    I see a lot of hand waving about how buggy 2.6 is but I do not see any references to bug databases or particular reproduceable bugs. How about some data?

    So far 2.6 has been just as solid for me as previous kernel versions but I try really hard to avoid using bizarro hardware and drivers that probably do not get much testing, and rightly so.

    I think we need to distinguish between bugs in the core kernel (code that everyone runs) and bugs in drivers. The vast majority of the Linux kernel code is drivers.
  • by Malor (3658) on Saturday May 06, 2006 @02:08PM (#15277750) Journal
    I've said this, here and elsewhere, over and over and over. Quality is something that has to be in software FROM THE START. It's not something you can retrofit.

    As soon as the kernel dev team decided that Linus' kernel didn't need to be stable anymore, as soon as they started waving their hands in the air and expecting 'the distros' to magically fix their problems, OF COURSE quality took a dive. One of the kernel devs said that it was okay for only one out of three 'stable' kernels to actually be stable! Stability takes a long time... they now refuse to support a given kernel for more than a couple of months. The 2.4 kernel still has a few problems, and it's been around for, what, six years now? Supporting a given kernel release for only a couple of months is impossibly stupid from a stability perspective.

    They're doing it this way because they're tired of doing the painful, annoying, tedious task of making sure the kernel always works. And the 2.6 kernel has, as a result, been a steaming pile of crap. Features don't matter if the fucking kernel doesn't stay up. No kernel since about 2.6.8 has worked in APIC mode on my ASUS KT333 board. 2.6.15 crashes my Intel 865 chipset servers randomly; they rarely stay up more than an hour or so. 2.6.14 broke traceroute. And with the constant stream of patches to their security fuckups, my system uptimes rarely exceed two weeks. Remember being proud of your kernel uptimes?

    The social contract with Linux for many years was essentially: "The official kernel tree is as stable as we know how to make it. You can trust this code." And that is what got Linux as far as it has gotten... the fact that you could TRUST IT. It NEVER fell over. The 2.2 kernel was one of the finest pieces of software I've ever run. 2.4 took a huge dive in terms of stability, and was a total mess until Linus branched off to 2.5 and let the poor harried 2.4 maintainer, Marcelo Tosatti, take it over. He finally whipped it into shape. He has done an outstanding job.

    What Linus et al need to do is GO PLAY IN THEIR SANDBOX IN 2.7. Let 2.6 fucking stabilize. They're shoving new features down our throats so fast that it's a part-time job just keeping up with the new stuff... and obviously NOBODY understands the security implications of moving this fast, or we wouldn't have so many goddamn security patches. We're gonna be having those security patches for YEARS because of this bullshit. The number of possible interactions in a system goes up exponentially with the number of features... so adding features should slow down over time, not speed up.

    Go BACK TO THE OLD SYSTEM. People crying about 'too slow release schedules' is a HELL of a lot better than people crying about Linux being unstable. Linux *owned* the word stability for many years, and it's in very real danger of losing it, right at its height of popularity. The old system worked. It got Linux where it is today.

    A simple 'bugfix release' won't do shit... it's the process that's broken. It'll fix some of today's bugs, but what about next week?
    • So true, this combined with the fact that by the time you get to test a "new" kernel release, there is already a new release. Then everyone starts bitching about how no one is testing each others patches. I'm with you. so i'll throw in my vote for the old system infact again, this should be officially addressed. The old system was indeed, melded for stability. This new "stable" just really means. It compiled on MOST platforms, not all.

      Initially I knew it would eventually devolve into this but everyone was s
  • by iminplaya (723125) <iminplaya.gmail@com> on Saturday May 06, 2006 @02:41PM (#15277899) Journal
    If I have old hardware that doesn't run 2.6, I can and do drop back to an older kernel. Hell, 2.0.40 [kernel.org] came out in 2004. And note the size! That kernel boots as fast on my 133MHz machine as 2.6 does on my 1GHz frankenstein. New features on a new kernel mean nothing on hardware that can't use it. If you want to keep running a new kernel on old hardware, obviously you're going to suffer plenty of bloat, as evidenced in the Windows world. And speaking of that, if MS had kept their old version on the market. They could have slimmed down the new versions considerably. Of course, most of us know that older versions are MS biggest competition, so that's why the lockdown, all made possible by our gracious IP overlords. So be it. I don't need them anymore. Even Apple put up their old old versions for free. But it doesn't run on new hardware. And their new software doesn't run on old hardware. And furthermore, wouldn't it be easier to troubleshoot and fix bugs in the older, smaller kernels? My general rule is to use a kernel that is approximately 6 months to a year newer than the hardware it's running on. We shouldn't try to make a single kernel to run on all hardware. We have lots of them, one for each specific time period. This also applies to the distros. The older ones are still available for your old hardware.

    FTA: Nowadays, many kernel developers are employed by IT companies, such as hardware manufacturers, which can cause problems as they can mainly be motivated by self-interest.

    Am I supposed to be surprised by this? Even the most altruistic of us are generally motivated by self interest. We all want some kind of return for our efforts....even if it's a simple "Thank you".
    • Why you don't want to run older kernel even on old hardware:
      - You want to run apps with new features or with most recent bug fixes;
      - Newer versions of applications don't run on older kernel versions;
      - There are no distributions with recent apps and old kernel.

      You say: "My general rule is to use a kernel that is approximately 6 months to a year newer than the hardware it's running on." Suppose I buy a new laptop. Now I have to wait "6 months to a year" before starting to use it?

We don't know one millionth of one percent about anything.

Working...