Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Red Hat Software Businesses

RH7 Crashes In Three Weeks (But Fixed) 301

Herz writes: "I got this email today from Red Hat. RH7 will crash out of the box in 3 weeks! The new Update Agent provided with Red Hat Linux 7.0 contains a daemon, rhnsd, which periodically polls Red Hat Network for updates. This daemon leaks file descriptors. On a default installation, all available file descriptors will be used by rhnsd in approximately three weeks, making the system unusable." The Red Hat folks have also provided a fix, though -- updated packages for those who want to use their update network, and the two-line method of disabling per machine for those who don't. After all, everyone wants uptime > 3 weeks, eh? And you don't need to wait for a "service pack," either.
This discussion has been archived. No new comments can be posted.

RH7 Crashes In Three Weeks (But Fixed)

Comments Filter:
  • If only the fix had been released earlier, perhaps the Update Agent could have repaired itself before anyone knew anything was amiss.

  • L33t-est is clearly BSD. You are a la/\/\3r.

  • Actually, that's not the problem. The problem is that probably none of the beta testers would have bothered to leave this particular service enabled since there wouldn't BE any updates to check for prior to release. Sure it's an oversight, but it's not like it reformats your hard drive or allows doubleclick.net to view your persiankitty.com cookies or opens your box to a root exploit.
  • RH4.2 was the most stable release in my experience. It has been going downhill ever since: loads of buggy little 'system tools' that do the Wrong Thing 90% of the time, messy package dependencies that cause unnecessary bloat (there was a time when you could do a 'base' install in under 30MB, try that now!), packages built with all the wrong options, etc etc etc...

    Unfortunately most other distributions I've tried (yes, that includes debian) are guilty of the same sins, to varying degrees.

    Linux has been dumbed down *way* too much lately. Yes, it works (mostly) fine out of the box if you just want a desktop box and never want to install any additional software or integrate it into a slightly complex network. If you want to do any real work (rather than looking at the pretty buttons in gnome/kde/whatever) you end up removing half the installed system and rebuilding from source :-(

    Anyone care to suggest a low-LL linux distribution? I would switch to *BSD if that didn't mean that half my hardware would stop working :-(
  • By all means try FreeBSD, but don't think all Linux distributions are as bad as Red Hat. Red Hat's well known as releasing buggy x.0 software. If you simply must have Red Hat, wait for the point release. Otherwise, might I suggest taking a look at Debian, Slackware or SuSE?
  • What?!?!?! you mean it crashes in three weeks? THREE weeks???? this has got to be a major breakthrough! I use win2000, and it's absolutely great, it can stay on and up for, like, four days and i was very impressed the first time i saw this, me coming from nt4 and all, but THREE weeks? i gotta get me one of those!...

    i'd heard that linux was good but damn!.
  • one word my friend. Slackware
  • Like the other poster said, it was real. It was pretty much a 'don't care' bug though -- whoever heard of a 98 box staying up that long anyway?

    I'm lucky to get 48 hours, much less 48 days! :-)

  • no doubt... i can just see it now... all these redhat systems are running along fine (after the patch to the updating system) and they auto-update a "bug-free" package with a new version that has some bugs and BOOM! all redhat systems go down at once. heh, that would be pretty funny for everyone who wasn't affected. MS would definitely incorporate that one into their marketing.
  • Contrariwise, I would prefer the system to bog down and require user intervention in that case, rather than just ransomly reset. At least then the user could see the need for the reset, save important files, sync disks, etc. This is in the case of a personal workstation or a server. If it really becomes a pain the user can easily script something to reset the machine at predicable intervals.

    In some cases in the telecom industry it's better to reset quickly and come back up, but I'm not sure that RH7 is being used for those sorts of things...

  • Stop talking out of your ass before posting. First the 49 day uptime bug affects win95 and win95a not NT4 as you suggested. The file that causes this comes with the default install and is easily fixed with an upgrade.

    Now according to your FUD every NT box must be rebooted every 49 days? I don't think so. We have windows boxes at work that get many months of uptime.
  • My NT Workstation's uptime:
    1. \\NTMACHINE has been up for: 36 day(s), 23 hour(s), 53 minute(s), 37 second(s)
    Of course, this beats our production server's 6 day(s), 1 hour(s), 47 minute(s), 47 second(s)
  • I do tech support for an Internet company, and it is amazing the number of people who call me amazed that they can't dial-up networking won't work after their Win9x machine has been on for more than a few hours. Reboot, and everything magically starts working again.

    Windoze is horrible at letting go of resources once it has used them.

  • Red Hat 7.0 - $29.95

    CD/RW burner - $229.50

    10 pack of CDs - $49.95

    Look on luser's face when the server drops - Priceless

  • Use to, but thank god I will never have to do that again.

    I can't really think of much that sucks more than installing and admining Windows NT on 50+ servers.

    Specially when you have to install ~10 things that each require a reboot and the server takes a few mins to reboot because of how much ram it has.

    Never again I tell ya.

    Oh and installing updates off their website? woohoo another reboot (for each server!) bah! I guess it's good job security for anybody who likes being an admin though.

    And you say they release hot fixes -all- the time? ugh!!

    And don't get me started on having to format and start over because you messed up installing a product, or you installed the products out of order.. Jesus!!

    hahaha....
    -----------------------
    Jeremy 'PeelBoy' Amberg
  • it might be a server-class OS, but no one running a production server is going to rely on an "auto-update" daemon to do their work, at least no one who is worthwhile as an admin.

    This is a feature for the DESKTOP, to compete with MS's same named feature, and will be most used by people who think a three week uptime is spectacular regardless.

    Besides that, the lesson to be learned again is that X.0 software (esp. OSes) is buggy. /.'s editors take cheap shots wherever they can get 'em, that's what's called "editorial independance," and I like it.

    --
  • (offtopic)
    The first debate was the typical podium one. The one tonight (Wed, Oct 11) will be a round-table, similar to the VP debate, where Gore and Bush sit at a table with Lehrer, and will be given a bit more time to 'talk' to each other like the VP's did. The final debate next week will be town-hall system, which IIRC has a bit more audience participation in it.

    Each type of debate has strenghts and weakens that each candidate was better at, so they choose the multiple styles when they dealed on the debate issue back in August. They also got the one that is most scripted out of the way, so that the latter ones will probably require more off the cuff answers and questions.

  • ... have probably already figured this out. I kept seeing bizzaro stuff in my log files from rhnsd. After looking up /etc/init.d/rhnsd I saw that it was not something I needed (I always download for free, so I doubt they are going to be giving me any service ).

    At least it was putting nice messages into the log file.

    For those who need it:

    • (as root)

    • chkconfig --level 345 rhnsd off (turns off the startup)
      /etc/init.d/rhnsd stop (kills the already running server)
  • Not to quibble, but isn't crond a "long-running daemon"? Granted most of these sorts of problems have been thrashed out of cron a long time ago.

  • If you leave a lock box closed too long, with government money in it, eventually it will be full of fuzzy math.

    And very little money.
    ___

  • So, I guess now the Gold Standard is:

    We will ship no distro before it's tested - oh, wait, gotta crank it out before the quarterlies on the street are updated.

    Never mind ...

  • I like having reasonably bleeding-edge versions of everything (other than unwanted update daemons), and I understand that's not what Debian is about.

    Bleeding edge isn't what Debian stable is about but that is definitely what Debian unstable is for.

    You asked for reccommendations... There's mine

  • by Ron Harwood ( 136613 ) <harwoodr@nOspaM.linux.ca> on Wednesday October 11, 2000 @07:04AM (#714421) Homepage Journal
    ...the win95 "43 day" bug... where it would crash exactly after 43 days...

    They never introduced a fix... the sheer idea of running win95 for 43 days was silly, even to MS.
  • I'm running X 4.0.1 from Rawhide on my 6.2 box.

    You'll have to update a few other packages to get it to install cleanly (initscripts, among others), but it can be done.

    BTW, you have to be willing to recompile from SRPMs - precompiled RPMs won't work. But here's how you do it:

    Recompile the X RPMs.

    Try to install them, find out what needs to be updated.

    Get those packages, rebuild them from their SRPMs and install.

    After that, the hardest thing is updating your XF86Config file...
  • You are right, people reading slashdot generally like anything non-M$ over Micro$oft products.

    Has been like that since Slashdot started. However, what you are saying about hotmail switching to Win2K has been covered here [slashdot.org]. Again, it may be biased (take the title of that article, for example :) but it certainly is covered.

    So please check first before making statements about /. next time.
  • I never heard of that bug. Are you sure that wasn't FUD?
  • They never introduced a fix... the sheer idea of running win95 for 43 days was silly, even to MS.

    Why was that? I personally like to leave my computer on it's better for the electrical connections within the machine and parts due to thermal expansion/contraction.
  • by Troed ( 102527 ) on Wednesday October 11, 2000 @07:05AM (#714431) Homepage Journal
    ... soon it's unstable enough to take over the desktop market!
  • Depending on your device drivers and possibly applications. I've had NT workstations (4.0, SP4 or higher) go over 49.7 days several times (the key is to not actually use it :-) and while they continue to run, they start acting totally wierd in some ways. Mostly in the GUI, AFAIcouldT, but I didn't wait around for something bigger to show up. All in all it handled it better than the Linux 2.0 workstation across the room I eventually rolled over a couple years ago.

    Of course, almost all NT stability depends on your device drivers, and not knowing that is the #1 cause of unstable NT installs done by non-pros.
  • According to this comment [slashdot.org], "the leak is in the rhnsd daemon which is installed and running by default after installation. Even people who never start the update agent will get bitten by this, unless they disabled the daemon after installation."
  • "It would appear to an outside observer who might read /. for the first time that RH is junk."

    Whoops, sorry, outside observers. Rob, please change the headline to read "Another RedHat Feature Discovered".

    If I were running on OS that came with a incompatible (and buggy to boot) compiler, a 3 week uptime limit and countless other "issues" I would call it junk. If RedHat is distributing a version of Linux with these problem, then RedHat Linux is junk. Forget what it looks like to "outside observers"--that's just propaganda. Many of us chose Linux because of it's reputation for technical excellence--if RedHat can't stand the heat, they need to leave the kitchen.
    --
    An abstained vote is a vote for Bush and Gore.
  • by v4mpyr ( 185039 ) on Wednesday October 11, 2000 @08:57AM (#714449)
    That's on a Unix machine running VMWare, right? ;-)

    That was supposed to be funny, laugh dammit.

    --
  • Presumably cron has addressed all the issues involved in running forever. That is why The Pim recommends it. He wasn't implying that cron wasn't a long running-daemon. Solving these issues again is re-inventing the wheel, and, in this case, re-inventing the square wheel.

  • There is no distinction between 'official' and 'unofficial' ISO images. Its all the same ISO. And the daemon doesn't do anything unless you tell it to (but it is running).

    The easyest fix is to just run up2date, and update the 'up2date' package, which owns the daemon.

    -- Crutcher --
    #include <disclaimer.h>
  • by banky ( 9941 ) <greggNO@SPAMneurobashing.com> on Wednesday October 11, 2000 @09:25AM (#714458) Homepage Journal
    I think somethings nutty, my comment disappeared.

    Anyway, my whole "-1, Flamebait" comment was:

    Are you installing RH7 on production machines the day it comes out? Are you INSANE? Look, its a bug. They have a fix. So patch the TEST MACHINES you're running RH7 on, so you can work out the bugs, migration path, and eratta, and get on with your life! You ARE running this on test machines, right? You are planning a migration to RH7, not just popping the CD into your mission-critical servers, right? You are following good sysadmin practices, right?

    Just because they rushed the release doesn't mean you have to take it. Take your time and be smart.
  • Why can't you sync the disks? All you need to do is to kill the redhat daemon and you get all of your file descriptors back, then just run like normal. The kernel will clean up after the application when it exits.
    --
    Mike Mangino
    Sr. Software Engineer, SubmitOrder.com
  • by Zoltar ( 24850 ) on Wednesday October 11, 2000 @08:59AM (#714461)
    Actually you bring up a valid question, with regards to slashdot anyways. If Win2K had this bug it would certainly been on slahsdot, and met with much approval. Many MS friendly posters will go on about how slashdot is biased and unfair towards MS, well, posting this story pretty much lets RH have the MS treatment. Seems fair enough to me.

    Now with regards to the bug, I think the obvious fix is to simply kill -9 rhnsd. There ya go, bug fixed. Yes it's a serious bug, but it's hardly a service that any production server needs so it's a non-issue in my mind. If you are running a serious server you are probably not going to let the the software update itself. You are going to get it up, apply any security patches that come out, and lock it in a closet somewhere. The "idea" that you must be running the most current version of software is a marketing ploy (which MS does very well) and is hogwash. If you have software that meets your needs and is stable and secure you certainly don't want to screw it up by randomly updating it.

    I think it was poor of RH not to actually test this properly, but I also understand that this is partly just the nature of the beast. They feel that they must move forward at a fast pace and this is the result.
  • by Telcontar ( 819 ) on Wednesday October 11, 2000 @08:59AM (#714462) Homepage
    It says
    /sbin/service rhnsd stop
    /sinb/chkconfig --level 345 rhnsd off
    .
    But of course it should be
    /sbin/service rhnsd stop
    /sbin/chkconfig --level 345 rhnsd off
    .
    This doesn't exactly help improving the impression of their .0 releases...
  • by Malor ( 3658 ) on Wednesday October 11, 2000 @09:00AM (#714466) Journal
    No, this is important to know.

    Redhat dominates the Linux market. This affects a LOT of /. readers. (obviously not all /. readers use linux, and not all linuxers use redhat, but the population is still going to be quite large.)

    As well, I think politically it's probably a good idea to be public about this kind of bug. Linux has a rep of being extremely reliable. I, for one, would like to keep it that way, and bugs that affect reliability thus NEED TO BE very embarassing events. Trying to suppress this kind of news may make Linux APPEAR more reliable but actually BE less reliable -- a lose-lose situation for sure.

    After all, if Sendmail suddenly started crashing every two weeks, the community would be justifiably furious about it. I don't think it's unreasonable to hold Redhat to a similar standard. They have an enormous advantage over Microsoft by packaging all the Open Source stuff instead of writing it themselves. Seems to me that expecting really good QA on their internally-written software is quite reasonable.

    You can bet that if Microsoft had released Win2K with a bug that took it down after two weeks it would have made national news. And Slashdot. :-)
  • >Instead I get a bunch of CDs that are now
    >useless.

    By that definition of useless, EVERY data CD is useless. There is no such thing as a bug-free release of any piece of software.

    >Oh I guess I could install RH 7.0 and then
    >download a million patches.

    Oh you poor thing. You have to type 'up2date' at the console.

    >Service packs are a great idea because you can
    >consolidate all of the fixes into a comprehensive
    >unit and thus you can tell people, my software
    >will work on Redhat 7.0 service pack 3

    I have to agree with you on this one. The concept of a service pack or a patch bundle is usefull at times.

    However, patches SHOULD be made available as soon as there is one, and should continue to be available individually.

    I don't know how many times during my stint as a support person I ran into a service pack or patch bundle that broke other things that were working fine.

    Matt
  • After the whole 2,500 bug fiasco, is it really prudent for Slashdot to be posting this story? While it seems true, it also seems very much FUD (and derogatory).

    Common politics would dictate waiting for the bug story to cool down before stroking the still-burning embers.

  • I'm running Slack on my desktop...

    ...and Windows 4.1.1 (aka Windows 98 + SP1) + various individual fixes (mostly security patches) on my laptop. Windows Update _does_ show a "service pack" that contains an updated jvm and security fixes.


    <O
    ( \
    XPlay Tetris On Drugs [8m.com]!
  • Actually, any *nix OS imposes a limit on the number of file descriptors a single process can open(try 'man 2 open').
    Therefore a normal application cannot use up all file descriptors. Probably however the update agent runs with super-user privileges ( I don't know for sure: does it also automatically update packages?)

    I see this bug as a result of a worrying tendence of open-source software to copy M$oft software in giving too much control to the computer and too few control to the user (outlook viruses, anyone?)
    In these matters my motto is : the dumbest of users is still more intelligent than the smartest of computers.

  • <flamesuit status="on">
    Slack doesn't seem to have this problem ;)

    "I would kill everyone in this room for a drop of sweet beer."
  • Is it me, or is Red Hat the only distribution that /. ever posts bug reports on?
  • This same problem in Debian wouldn't be posted here in 20 years.

    No. Because Debian unstable/frozen gets tested by such a lot of people that a Debian-Crashes-In-Three-Weeks problem would get fixed way before the actual release.

    Not saying Debian is perfect, just that that particular problem would be virtually impossible.

  • Well, I'm a Red Hat user of old, and quite comfortable with the general quality and support provided.
    However, I've abstained from buying RH 7, due to the massive problems they seem to have with this release. Far more than I remember in the 5.0 release and 6.0 release.
    I'm using Debian at work, and becoming more and more enamoured of it's stability and ease of upgrade.
    I was under the impression that the RawHide system of pre-release was meant to cure this kind of screwup.. This also dents my faith in that preconception.
    The errors in the update agent are unforgivable though. With any release that's as shaky as a x.0 release from RH, they at least need update stable.
    C'mon RH. Get your act together before you really lose your credibility.

    Malk.
  • by Menthos ( 25332 ) <menthos@NOsPam.gnu.org> on Wednesday October 11, 2000 @11:09AM (#714503) Homepage
    If you've been following all Red Hat stories lately and read most comments you'd notice that the most people complaining about RH 7 are the people that don't actually run it.
    Most of posters stating that they do actually use RH 7 seem quite happy about it, noticing that it is even more stable than RH 5.0 or 6.0 ever were. Most of the bad press on /. was indeed very bad journalism, even FUD in the case of the "2500 bugs" story, which wasn't even close to the truth (the real figure of unsolved bugs, feature requests and other issues in RH7 was 150, yes one hundred and fifty, not 2500). The idiot poster who submitted that story counted not the outstanding bugs in RH7 as he was claiming but all entries in Bugzilla for all previous RH releases, including feature requests, resolved bugs, duplicates, non-reproducable errors, bug reports missing critical information and otherwise closed "bugs"...

    So, chances are that you should trust /. a little less and learn from your own experience by trying it... In my experience, it is better than all previous RH releases; the way it should be.

  • by drfrank ( 16371 ) on Wednesday October 11, 2000 @07:08AM (#714505)
    Okay, we all hate Microsoft, but come on. Cheap digs like "you don't have to wait for a service pack" will just turn people off. (Remember the first Gore vs. Bush debate?)

    You can't do that standing on such shaky ground. One could argue that it _is_ a service pack, or point out that MS does usually release patches to serious problems within a week as well as rolling them up into a service pack.
  • I don't remember whether it was 43 days or not, but yes, there was a Windows 95 bug that was like this. (It was above 30 days as well.) I ran into it. (Yes, I ran Windows 95 for more than 30 days. No, the average user can't keep their system clean enough to do it for the most part. Yes, I did. Yes, I still think Windows 95 is a world better than 3.1.)

    As for a memory leak, it's one of the most common errors you can have. 3 weeks is still a pretty good time frame; the fix was out very quickly; it was made public, the how and why of it. These are things you won't see with closed source companies. Bash RedHat all you want, truth is their internal programs just simply don't get the exposure the rest of Linux per se does, so some bugs slip by.

    -- Talonius
  • I don't know how the update agent works, but does it do anything if you dont' have an "official" cd from Red Hat? i.e. if you just got an ISO, is the update manager able to do anything, or does it need a password, etc?

    ---

  • Actually, it's a reality because on production machines, you don't leave ANYTHING to chance. EVER. PERIOD. END OF STORY. Much like one of the main points of OSS is that you don't trust closed source, when deploying, you DON'T TRUST SOMETHING THAT HASN'T PASSED YOUR OWN TEST ENVIRONMENT.
  • Forgive me if I'm being snippy, but why is this a major issue? Yes, we've talked about problems with Rh 7.0. Yes, we've bitched about the new GCC shipping about it. But what is this, open season on RH? Since they are well known and popular, did they suddenly become evil that we have to slam on them all the time? It would appear to an outside observer who might read /. for the first time that RH is junk. And who knows how many people might have gotten that impression and decided not to switch to linux from NT.
  • It's just a good idea to flush out the system now and again...
  • i can barely imagine anything i want my systems doing less than automatically looking for new software and/or installing updates without my fully conscious awareness of same and active involvement. do people actually find value in this type of service?
  • That's funny! Debian 2.2 shipped with over 10,000 open, non-wishlist bugs.
    I think you misunderstand - that graph is all open bugs on *any* debian; I imagine most of those 10,000 were for the currently unstable Debian 2.3.
  • by andyh1978 ( 173377 ) on Wednesday October 11, 2000 @07:11AM (#714528) Homepage
    I never heard of that bug. Are you sure that wasn't FUD?
    No, it was a real bug.
    And it was 49.7 days (the time it takes for a millisecond timer to overflow a 32bit unsigned integer.
    It was fixed in one of the service packs.

    See this MS KB entry for details [microsoft.com].
  • Re-read my message. I said it's my personal server. I'm not an admin, I'm a software developer.

    I'm not complaining that there are bugs in RH7 - I know it's new, and I'm the first person to tell clients not to put new software on production servers. I made a considered choice to do this on my own server, because the hardware needed upgrading anyway, and the RH 5.2 which has been running flawlessly on it for the past couple of years was missing some stuff that I needed.

    My problem is with the nature of this RH issue: it's a bug in a piece of software RH developed internally, and install by default without any indication or choice. I find that kind of "thinking for the customer" undesirable and unacceptable, and as I said, Microsoft-like. No doubt it's a reflection of Red Hat's post-IPO mass consumer focus; unfortunately that doesn't suit me very well.

    As for checking the services, I did the install over the weekend, looked at the long list of services (since I installed a bunch of database and other server stuff) and decided to check it out later. I would have found rhnsd soon enough. Security isn't much of an issue because the box sits behind a firewall at the colo site with only web, imap and ssh ports open; the web server is my own build of Apache 2.0 alpha, for development purposes only.

  • This is a question posed by a Linux-wannabe who really knows nothing:
    Does Linux have a max # of file handles, after which new handles cannot be created?
    Let me pose this another way -- Can I crash a Linux box by opening a whole lot of files? Or is this daemon run as root? Then the new question is why is a daemon that has the capability to automatically update critical software, running as root? Surely it could be spoofed to update a system with poor DLLs?

    To a Linux newby, this whole article sounds very scary.
  • This same problem in Debian wouldn't be posted here in 20 years. Unless you think Debian doesn't have any bugs...

    That is because apt-get's functionality has been thoroughly tested for quite some time.

    It is actually kinda nice to see other distributions catching up.

    Of course, auto-update will be pretty broken with the care that goes into packaging RH RPMS. Have you ever tried to upgrade a RH distribution manually ? It is a broken mess of irrelevant and missed dependencies. Debian does this seamlessly.

    What RH really needs is a thorough packaging policy, like this [debian.org] and this [debian.org]. Only with a thorough packaging policy can upgrades and auto-upgrades be useful.

    Mainly, I hate using rpm --nodeps --force. On my debian system I never need those --nodeps options. Wonder why ???
  • by BigBlockMopar ( 191202 ) on Wednesday October 11, 2000 @02:38PM (#714546) Homepage

    They never introduced a fix... the sheer idea of running win95 for 43 days was silly, even to MS.

    Why was that? I personally like to leave my computer on it's better for the electrical connections within the machine and parts due to thermal expansion/contraction.

    Better for the "electrical connections within the machine"... Uhhh, okay.

    Actually, it's just an expansion-contraction issue within the ICs, in particular. And the hard disk drive, landing the heads every time you shut down (but this is the same as if you leave the power management on). Cheap power supplies can sometimes make issues with voltage spikes as they turn on; if you buy a good one, the voltages all come up to their regulated levels and then the Power_Good line is pulled high and the motherboard is reset.

    So, if you have a good quality system, you probably won't have any problems with the wear of turning your machine on and off in reasonable useage until after the machine is obsolete.

    Compare this to the higher power bills, risks of fans dying and overheating that conservatively overclocked processor, as well as more potential uptime for a thunderstorm to kill it, and I feel it's probably wise to shut off the computer when you're not using it. Of course, that's discretion. Do you turn off the computer when you leave the office for lunch? Nah. For the weekend? For sure. Overnight? I do.

    I do speak with some authority here; while I'm not an electrical engineer, I have several years of experience design engineering critical radar systems for Litton [litton.com]. I also used to write electronics design and construction columns for Popular Electronics magazine.

    As for Windows 9x/ME, it's only under controlled laboratory conditions that you can make a Windows box run long enough to see that bug. I've managed to see the 49.7 day bug once; and with the M$ fix, I've seen a record uptime of 103 days with Windows 95B OSR2. Windows 3.1/DOS, I've managed to keep running for months at a time.

  • by somen ( 241384 ) on Wednesday October 11, 2000 @07:16AM (#714548)
    Although I'm not an advocate of any certain distro, I must say that I applaud RH the effort they have put into open source software. However, this problem shows one problem with open source: Quality control on open source software.

    In an ideal situation, every programmer will look at the source code, and contribute to the effort of the open project. Most people (like myself) are free-riders, who have no ability to program. So as idealistically sound open source may seem, there are certain issues to worry about.

    In RH's case, at least they pay their workers-which means that they are more willing to do the dirtywork of bug fixing others' code (in theory). Although, cases like this gives another doubt in the "Linux for the business" credibility since more non-techies seem to equate Linux with RedHat. It seems to be an understanding by almost everyone, that any RH x.0 distro is pretty much an experimental state, and must not be used on production servers. This, however, makes theo perating system appear "buggy" and "not production-quality" to the uninformed, hence I wish they will take more pride in their distribution instead of "hey, we had that packaged into ours first!" I honestly wish comments on how RH's similarity with MS due to their tactics are only on the surface. Unlike MS (whose operating system is proprietary), RH simply has their own distribution of an open-sourced OS. If you so choose not to use their distro, you have enough other choices: e.g. Debian, Mandrake, Slackware, etc etc.

  • Mac OS X's kernel (Darwin) is not your typical monolithic BSD [freebsd.org] kernel. It's a Mach kernel [cmu.edu] with a layer of BSD-like services around that. Darwin is Nearly-Free Software under the Apple Public Source License.


    <O
    ( \
    XPlay Tetris On Drugs [8m.com]!
  • On XFree86 4.0.1, with a Hauppage "WinTV Go" card.

    Watching the 2nd US Presidential debate start now, in fact.

    Email me for a copy of my conf.modules (which may not be helpful if you're using a non bttv card) or XF86Config files.
  • ..to make sure that people buy the "free" software? I mean, if you bought and registered the OS, they can tell you "Oops, it won't work after three weeks." Otherwise, you have to rely on the net or word of mouth to be that informed.

    Just a thought.

  • Don't make excuses. This is exactly the same type of crap that MicroSoft dishes up, and RedHat is guilty of delivering it.
  • #include <stdio.h>
    int main() {
    printf("Hello, World!\n");
    }

    You are right, it is possible to write a small program without any bugs and - wait, sorry, I forgot to make it return an exit code. Let me get back to you...

  • This is EXACTLY like waiting for a service pack. It amazes me to watch you attempt to downplay this bug.

    Personally, I frequently use Red Hat & W2K to do my job, and am quite pleased with both. As I've been watching, I've seen you go hog wild over the Windows 47 day bug, but yet when RH has a 3 week one, it must not be a big deal... Hello, THIS IS A SERVER-CLASS OS. IT IS A BIG DEAL.

  • Hrmn... I abstained from RH6.0 (moved direct to 6.1), but I've been using 7.0 and 6.9.5 for quite a while and they both seem quite nice. I don't think you'll have much of a problem with it - much less buggy than 6.0 was :(
  • by Alan Shutko ( 5101 ) on Wednesday October 11, 2000 @11:40AM (#714574) Homepage
    The Win95 47 day bug was funny because the bug had been there a long time, and nobody had found it... implying that nobody had been able to keep a Win95 box up for 47 days.

    RHL 7 has been out for two weeks. It's not even in _stores_ around here yet, but the bug has been found. It's been fixed.

    That's why it's not a big deal.
  • by pq ( 42856 ) <rfc2324&yahoo,com> on Wednesday October 11, 2000 @08:10AM (#714576) Homepage
    Please, people, if you pull a story off the main page and then restore it, add an Update: line so that I don't get this feeling that I'm slowly losing my mind. I didn't dream it all, did I? This was on the main page, pulled off around comment #30, and restored around #50... what's going on? Please?

  • by Fervent ( 178271 ) on Wednesday October 11, 2000 @08:10AM (#714582)
    Did anyone notice that this article disappeared for about an hour today? Was there some complaints/questions to its authenticity?

    Wait, a revolutionary moment!!! Slashdot confirms an article before posting it!!!

  • by the_quark ( 101253 ) on Wednesday October 11, 2000 @08:10AM (#714584) Homepage
    As funny as this is, because of exactly what the problem is, it's not going to be an issue:

    The leak is in The Update Manager. If you're not running the update manager, you don't have a problem and the system won't go down. If you ARE running the Update Manager - well, it'll just automatically get the update from RedHat, won't it? Assuming that part works, anyway...

  • [using rpm --nodeps]
    Huh. I never need them on my Red Hat system either. Wonder why???

    Tell you what. Install redhat 4.2. Then upgrade one rpm command at a time to redhat 5.0, 5.1, 5.2, 6.0, 6.2, and then 7.0. And see how many times you need to use the --nodeps option.

    The incidence is dramatically lower for debian debs. It is not the deb format. Rpm has all the same capabilities. It is the care that goes into packaging, highlighted by the packaging guides. Try to find something more comprehensive at the web site of a linux distribution.

    There ought to be limits to freedom. - GWB
  • Sounds like Red Hat is getting ready to takeover the desktop market. It now has the same functionality as Windows Me! :-)

  • This particular bug was a rollover in an uptime counter. When it rolled, it caused an unhandeled exception in the kernel and BOOM.
  • by bkosse ( 1219 ) on Wednesday October 11, 2000 @11:54AM (#714595) Homepage

    It's not a pretty sight. It's not too far off from running out of memory. And, the 4096 number is a system wide number:

    file-nr and file-max

    The kernel allocates file handles dynamically, but as yet doesn't free them again.

    The value in file-max denotes the maximum number of file handles that the Linux kernel will allocate. When you get a lot of error messages about running out of file handles, you might want to raise this limit. The default value is 4096. To change it, just write the new number into the file:

    Now, it's not that when that number runs out, that process dies, but the *NEXT* process to request a file dies. This happens on officially penguin-peed kernels as well. You need to set resource limits to keep an individual process from getting to trigger happy with files.

    And by the way, take stock 2.2 and make a program which either A) fork bombs or B) chews memory. Watch the system go down in flames. In the case of (B) you (once? Is it fixed?) had the chance of watching the kernel give init the boot, which is very ugly.

    --
    Ben Kosse

  • Did you even read the MSDN article to which you linked? If this bug was fixed in Windows 95, why would they offer a downloadable patch for Windows 98??

    Computer Hangs After 49.7 Days

    ------------------------------------------------ --------------------------------
    The information in this article applies to:

    Microsoft Windows 95
    Microsoft Windows 95 OEM Service Release versions 2, 2.1, 2.5
    Microsoft Windows 98



  • by The Pim ( 140414 ) on Wednesday October 11, 2000 @08:15AM (#714607)
    That's why you use cron instead of writing a long-running daemon.
  • Comment removed based on user account deletion
  • What, exactly is the compiler not compatible with? I give it C++ source code and it compiles it for me.

    It generates perfectly ISO compatible code. It's not RedHat's fault the ISO spec is vague and underdefined. Expecting different versions of a C++ compiler (or different C++ compilers for that matter) to emit compatible code is a blatant misfeature.
  • I disabled the rhnsd about 15 minutes after the install. I suspect alot of others did as well due to privacy questions, etc... Didn't you guys turn it off as well?

    Why anyone would want their system to "auto-update" is beyond me. I think you're just asking for trouble if you do that.

    Did M$ buy some stock in RedHat? Seems like all these bugs and errata stem from a basic case of the dumbass, joined together with some deadlines from the marketing droids... geezz!
  • by dattaway ( 3088 ) on Wednesday October 11, 2000 @08:32AM (#714634) Homepage Journal
    tarballs rule! They aren't a package, they are a state of mind.
  • it's also directly copied off their errata page (which, seeing as I installed RedHat 7.0, foolish me, I really should be heading towards more frequently.)

    If you don't want to go to the errata page for update news, let the update news come to you...

    mail -s subscribe redhat-watch-list-request@redhat.com < /dev/null

  • Umm, I don't thinkt that was a bug with Windows NT 4.0 there buddy, I've run my servers and workstations for well over six months without reboots, the glitch was in Win95.
    ---
  • It does require registration, though there is an 'anonymous' registration option, that sends only your hardware archetecture (so that the right rpms get sent) and an email address. It is one of the free levels of service. (of which there are several)

    -- Crutcher --
    #include <disclaimer.h>
  • Windows NT 4.0 release: 1996
    49.7 day bug discovered: 1999
    Fix released: never

    Well, it was Win 95 and 98, not NT. And it was fixed. click [microsoft.com]
  • You know that if this was about win2k instead of redhat there would be 500 posts saying "linux r00lz MS suckz0rs!!". The amount of bias that goes on here is incredible. Somehow taco missed the story about hotmail switching over to win2k. Thats a pretty major story, but since its pro MS it was quietly ignored.
  • by ErfC ( 127418 ) on Wednesday October 11, 2000 @08:26AM (#714652) Homepage
    I don't see this as FUD, or derogatory, and I don't see how politics should be involved. As is pointed out, the fix is easy (either update the package, or turn off the daemon, or both), so we don't have to wait for a service pack or anything.

    And I'm very glad to know about the bug and the fix; it's something of a showstopper, and I didn't know the update manager was active by default, so this is valuable information -- not RedHat bashing.

    -Erf C.

  • I'm not arguing that the testing wasn't incomplete, just that this particular aspect of it would have been hard to check for. They probably did make test updates to run this through, but who would think to set this in motion for X number of weeks, just to check for this type of issue? I mean. What if you set the test period for four weeks, and the problem hit on the 33rd day? As long as I'm using my head, why is this running as a daemon at all? Wouldn't this type of thing be better as a cron job? Isn't it sound security to limit the number of daemons running at any given time as much as possible?
  • by macpeep ( 36699 ) on Wednesday October 11, 2000 @08:27AM (#714662)
    The 49.7 day bug was not in NT - it was in Windows 95. We have several NT boxes at work that have not been rebooted for months and months. I still like Linux servers better but for a workstation, I still prefer NT and there sure as hell is no 49.7 day bug in NT.
  • You folks are talking about upgrading, thats all fine and dandy, but how about a new server from dell with 7 preinstalled.
    I'm not talking about upgrading. Try to stay with me here. There is a concept, that concept being called 'in production.' Anything said to be 'in production' should be throughly tested. That includes hardware, software, configurations, and usage patterns. In other words, you buy a shiny new Dell with RH7 on it, IT NEEDS TO BE TESTED FOR AN APPROPRIATE TIME before it goes into production, to test both hardware and software.
  • Comment removed based on user account deletion
  • by WillSeattle ( 239206 ) on Wednesday October 11, 2000 @08:39AM (#714671) Homepage
    Woke up this morning
    Crawled out of bed
    Couldn't wait to get that Red Hat distro you said

    Told you to worry
    Told you to wait
    But no you want to mirror it from outside the state

    Refrain
    I got the blues
    Got them old dot zero blues
    Cause I done installed that distro
    And it blew up on my shoes

    Wish I had DSL
    Wish I had fat pipes
    But on a 56K modem
    The download's such a fright

    It's all installed now
    Servers up and cool
    But I come back three weeks later
    And look just like a fool

    Refrain

    Got burned by Compaq
    Got burned by Dell
    Got burned by Microsoft
    Now I'm in Red Hat dot zero hell

    Refrain

    Now don't you worry
    This one's ok
    It won't drop under loads now
    Cause if it does we'll make you pay!

    Refrain

  • Way to twist his words. There's this crazy thing called CONTEXT that we should consult before bashing someone/thing.

    Obviously, when he said "We did a lot of QA," he was talking about the snapshot of GCC, and not the OS as a whole.

    Sure, they should have caught this bug, or better, it should have been considered at design time, but (and I'm not trying to make excuses for Red Hat here), to catch this bug, they prolly would've had to have had a 7.0 system up and running for 3 weeks straight. Maybe their test cycle is shorter than that. If their test cycle was say.. 6 weeks, then who knows what kind of bugs might pop up at the 6 1/2 week mark? You can only allow so much testing for a product before releasing it, or you'd never release anything.

    As I said, yes, they should have caught this, but as we all know, no software works perfectly, and sh*t happens. At least there's a fix for it.
  • Right, but this application doesn't "go haywire," per se, as in "crash and burn" and scribble all over other peoples' core--it uses up a resource gradually--there is a difference.

    This isn't much different from an application that runs away and fills up the disk or allocates all available memory. Should Linux allow an application to deplete a resource without giving the admin a chance to kill the offender first? Probably not, and maybe this is one of those issues that will have to be addressed in the 2.5.0 tree. At any rate, Linux is still far more stable and dependable than that other OS.

  • I'd bet that's because a Windows3.1/DOS machine is the virtual equivalent of a 'rock'.

    Normal modern systems have weird daemons in the background which eventually contribute to their demise.

    If that was the sense in which you're calling Windows 3.1 a rock, I fully agree.

    As long as an application doesn't crash it out, I've never found DOS or Window 3.x to ever be unstable. Primitive, yes. Full of frustrating quirks, absolutely (this *is* an M$ product, after all). But not spontaneous crashers.

    Something tells me that it's a REALLY REALLY BAD IDEA to allow your current system configuration to go out over the network towards a centralized server every 30 minutes.

    Hey, look! This guy's running an old version of BIND, it's black hat time, et al.

    <grin> Does the daemon that does this report *itself* to the server this same way? If it does that, at least the new RH Insecurity Daemon also gives itself a chance to be the door to an intruder...

    I know a lot of people who use their personal computers as servers in one way or another, and turning the thing on and off just isn't workable because you have to _plan_ your uptime to when you think you might need to get something remotely. This never ever works.

    Absolutely. In fact, several of my computers run 24/7. But only those that need to; the rest of them are turned on when I get home, and turned off when I go to bed. By the same token, several of my computers at the office are up all the time, and several more go down when I leave for the night.

    After all, the computers are there to be used, not to be protected from any bad thing that can possibly happen to them.

    This isn't really an issue for Microsoft operating systems at this point, because remote access to most of them is quite horrid.

    I disagree. It was really thoughtful of M$ to automatically bind NetBIOS file and print sharing to your internet-connected network adapters by default. Evidently, someone was planning ahead for remote accessibility.

    I've kept various computers on 24 hours a day, 7 days a week since I was 14 and ran a Bulletin Board System, and personally I've never had problems besides broken burnt-smelling fans every couple years.

    <grin> In my experience, it's usually not the fan that smells burnt when it fails... a brushless DC fan doesn't heat up when it gets stalled by dust or a plastic "ball bearing" melts and seizes the rotor. What heats up and starts to smell is the component(s) that the fan was supposed to be cooling.

    The added functionality I get from it is _way_ more than the sacrifice, and since those who run Linux are (for the most part) serious computer users, its not realistic for us to do otherwise.

    My Linux servers run 24/7/364.25. But my Windows boxes don't; neither do my general-purpose Linux machines. Discretion.

  • Sorry for the confusion. I don't think I made clear what the nature of my problem with this bug is. I said this is in another message [slashdot.org]:

    My problem is with the nature of this RH issue: it's a bug in a piece of software RH developed internally, and install by default without any indication or choice. I find that kind of "thinking for the customer" undesirable and unacceptable, and as I said, Microsoft-like. No doubt it's a reflection of Red Hat's post-IPO mass consumer focus; unfortunately that doesn't suit me very well.

    As for RTFM, where exactly is this documented? The paper manual has shrunk significantly since RH 5.2, and I have yet to find the documentation, paper or otherwise, about the fact that this update daemon gets installed by default.

    Bottom line: I'm a developer, and I don't need someone else deciding on my behalf to install daemons on my system that I don't care about. That in itself is 50% of my issue with this. The fact that this daemon had a fatal bug is the other 50%. Red Hat screwed this up both ways.

  • Yeah, but we like using RPMs. The dependency database is a wonderful thing. It makes upgrades so much easier.
  • by 11223 ( 201561 ) on Wednesday October 11, 2000 @08:45AM (#714697)
    Naah, they just ran out of file descriptors and had to remove the story from the front page list :-P

Say "twenty-three-skiddoo" to logout.

Working...