Microsoft Code at Fault for Half of all Windows Crashes 819
Flamester writes "In a ZDNet Australia story, Microsoft is claiming that half of all MS Windows crashes are the fault of third party code, not their own. That is, according to Dr. Watson.
The article also goes into the 'rigor in which MS tests their products before release'. "
1st post karma-whoring (Score:4, Informative)
Scott Charney, chief security strategist at Microsoft, told developers at the TechEd 2003 conference in Brisbane, that information collected by Dr Watson, the company's reporting tool, revealed that "half of all crashes in Windows are caused not by Microsoft code, but third-party code".
Charney's comments come as the company highlights the rigour with which it tests its own products before release. Microsoft emphasised that products such as Yukon and Exchange Server were undergoing thorough testing -- both internally and via independent third parties -- prior to their release to the market.
The company is employing root cause analysis and event sequence analysis procedures to scrub out the creation of sloppy code. The result is that individual developers have a high degree of accountability for the code they produce, while the systems and processes associated with code development are rigorously monitored.
Root cause analysis enables the company to check closely the work of individual developers. "If a developer has written vulnerable code, then we look at what else that developer has written and check it," Charney said
Event sequence analysis takes this further, analysing the reasons why the vulnerable code was written. Charney said it was not necessarily so they can sack whoever is writing vulnerable code, but find out the reasons why and how Microsoft improve their staff with training or more efficient processes.
As Charney made his remarks, Charles Sturt University announced they would be offering a Master of Information Systems Security degree including MCSE:Security industry certification.
Charney's also reinforced Microsoft's message to developers and network administrators that they needed to build secure applications and networks "from the ground up".
The chief security strategist's remarks have come at an unfortunate time, as mainstream and niche media outlets produce heavy coverage of the impact of the MSBlast worm, which has infiltrated corporate and enterprise networks worldwide.
John Dvorak has some interesting crash stats... (Score:5, Informative)
sPh
Re:Uhm, right... (Score:5, Informative)
Re:Uhm, right... (Score:5, Informative)
I suspect that they are referring to drivers and other kernel-space code. The standard Microsoft weenie excuse for instability in the past has been "it's the drivers!", blaming the video drivers is a favourite.
Remember that Microsoft don't write most Windows drivers, they don't have to because their market share is so great, any hardware manufacturer who doesn't supply Windows drivers is not competitive.
I believe this is the reason why Microsoft introduced their "Microsoft signed drivers" that are supposed to guarantee Microsoft-level stability (!).
However, I have to laugh at Microsoft when they claim 50% of crashes aren't their fault. It's like an advert for a diet pill saying "Doesn't cause death in over 90% of people!".
Re:Uhm, right... (Score:3, Informative)
Let's be honest here, if it's bad drivers that are the main problem, they also affect Linux just as badly. I've seen sound drivers lock up my system many times under Linux. The difference between Linux and Windows is that more companies produce more drivers for Win32, and so the chances of a user encountering a problem are increased.
Open Source, Closed Source, Not the Problem (Score:3, Informative)
Re:Headline should be: Microsoft Admits to Testing (Score:3, Informative)
Hey, they're TESTING! Wow, they really are taking this trustworthy computing thing seriously.
Probably just a flippant remark, but they actual do test all of their applications and OSes, and they have (you know, all those internal and public beta TESTS and such).
But maybe this time they'll fix the bugs, instead
of just making note of them.
Re:John Dvorak has some interesting crash stats... (Score:2, Informative)
What percent of machines crash once a day? Gates did not say. It could be the case that 20% crash once a day and 5% crash twice a day. It could be the case that 90% crash once a day and 5% crash twice a day. The number of machines that crash twice a day gives no information on how many machines crash once a day.
Re:Dr. Watson catches OS crashes, not app crashes (Score:2, Informative)
Say what? Dr. Watson most assuredly catches application crashes. Just because XP doesn't say "Dr. Watson Error" anymore it still is dr. watson that is logging your error.
Re:Ring 0, Ring 3? (Score:5, Informative)
Regardless, if a driver is running in the same memory space as the subsystem, a driver crash is going to take it out. It doesn't matter what ring the code is in. Again, back in NT 3.51 days graphics drivers were kept in seperate memory spaces, in ring3, but that was dropped due to piss poor performance.
The GDI subsystem (several layers away from any graphics drivers) currently sprawls Ring0 and Ring3.
Indeed BS (Score:4, Informative)
When Windows gets read-only mempages (IIRC win2k3 has them) for kernel processes, this will be ended, until then: the 3rd party drivers are mostly at fault.
Re:Uhm, right... (Score:5, Informative)
Guess you've been caught talking out of your ass again (but that's what ACs do)
Graphics in Ring0 (Score:2, Informative)
How about moving the GDI to ring 0 for performance reasons, allowing a printdriver to crash the OS.
Re:Uhm, right... (Score:2, Informative)
Its still a bug even if it doesn't bring the system to it's knees for days.
Actually, it's a stress test. This is generally an automated tst where we would run scripts to open and close various applications and whatnot for days. One script I ran when I was contracting at MS was something that opened up every single image in a certain directory (100+ jpgs) and at the same time, the machine would be also opening up several dozen excel spreadsheets, doing calculations on them, and exporting them to word files.
The system would be pegged at 100% CPU usage and the memory usage would max out as well, hence it was unusable from an ordinary standpoint. The scripts generally can be set to autoterminate after a certain amount of hours. Over the weekends I'd sed them to terminate after 72 hours and would arrive back on mondays to check out what ran and what didn't. For the systems that crashed, I'd have to send out reports to the various developers regarding how it crashed, what module actually crashed, and when it crashed.
Re:Uhm, right... (Score:5, Informative)
I've done an embedded system with QNX, and it is quite the nice RTOS.
Under QNX, the devices hang out in the device manager, which is not in the kernel space, and the drivers are handled by the process manager, also not in the kernel. Since the kernel exists just to pass messages, essentially, it is uncrashable.
Re:Uhm, right... (Score:5, Informative)
Re:Uhm, right... (Score:4, Informative)
And Watson can and does report back to "the mothership" for driver crashes, when the user allows it.
Re:Uhm, right... (Score:3, Informative)
re: "Consider this: Microsoft has been ordered not to use the term MSCE in both the United States and Canada because Microsoft does not have the legal right to "certify" people as engineers."
cite?
Canadian Council of Professional Engineers (CCPE) opposes the use of the word "Engineer" in the MSCE designation [peo.on.ca]
Microsoft Debating World-Wide MCSE Name Change [certcities.com]
Re:Uhm, right... (Score:3, Informative)
Honestly, I think they may be including more than just OS crashes in these statistics. I'd say that in the past month, my computer (running WinXP) has crashed a handful of times. Of those crashes, one was severe (I think explorer restarted and apps closed? whatever happened I didn't need to restart).
The other 5 (estimated) or so "crashes" were IE going down. Of the 5 times IE went down, a couple were caused by espn.com and a couple were caused by a nasty ad on nytimes.com.
But here's my point: when I had my "severe crash", I reported it via Watson, and it didn't know wtf went wrong. When espn.com crashed the first time, I reported it via Watson and it told me Flash died. For the other 4 times Flash killed IE, I force-killed the program and DIDN'T report the problem because I knew what it was.
So my statistics for the month are: a handful of app crashes (1 reported) and 1 os crash (1 reported). So I'm right on par with their data, that 50% of my REPORTED crashes were OS crashes (Microsoft's fault) and the other crash was IE going down (not Microsoft's fault).
In the end, based on my personal experience, I'm guessing that they include app crashes in their data, or at least IE crashes (since it's "tied" to the OS). It might not be a driver issue, and it might not be Microsoft's inherently flawed paradigm for writing code at all.
Re:Uhm, right... (Score:5, Informative)
The "Texas Engineering Practice Act" has a whole page of exceptions, but they call them "exemptions".
Lets see if we can find the relevant parts:
Well, that would seem to apply quite nicely not only to train engineers, but also software and systems engineers.
Sorry, Mephisto, that's no excuse (Score:5, Informative)
Re:Uhm, right... (Score:4, Informative)
I've been reading the replies to this thread, and I'm a little bit confused. The licensing of engineers has been a hotly-debated practice for...well, for as long as engineers have been licensed.
Whether in favour of or opposed to licensing, I don't see how it could qualify as a Ponzi scheme [rr.com]. It may or may not be a worthwhile practice, but it's quite a stretch to describe it as a pyramid scheme.
Re:If it's ATI, it *is* the video drivers! (Score:4, Informative)
Of course, the all-in-wonder pro I have is old (1998?) - so I can see why they want to kill it off, but dragging your customers kicking and screaming to a new product isn't very good for customer relations - and ATI knows this now. Nvidia and other companies made them wake up.
Unfortunately they do have only 1 real competitor for the retail box market, so they aren't that concerned, but competition does help. Not that they will ever fix the drivers for the AIW Pro and their older cards, the PR damage has already been done, and the cards replaced.
Their support is, of course, useless, just because they have to deal with so many buggy - and often weekly - releases. There just isn't time for them to find the problems. No point to call / ask for support because it will not be helpful. Of course venting is fun, but hey. Besides, half the games out there don't work properly and cause issues by themselves.
Every once in a while, they get it mostly right, but it is a crapshoot. I've had drivers for my 7500 that would refuse to let me log on to 2k, but also the current version which works in both xp and 2k3 without any problems - i.e. I've had 0 bsod under 2k3 w/ my box with the 7500 in it since rc2 came out. A couple with "recording", or trying to with the AIW Pro - although that was expected. (the release for the 7500 is 6.14.1.6307 2/28/2003 if it helps anybody).
As far as I can tell, 2k3 IS stable. I've abused my system - knocking out ide cables while the system is running, "hot pulled" pci cards, etc. Basically anything that would not cause the computer to reboot due to a short would keep the system up. My processer fan came out for a couple minutes, I saw it running at 95C and dove for the power switch, but 2k3 stayed up. Granted, it isn't that hot, but still.
If I still lived in Ontario, I'd probably drive by at 120kph and throw a used tire rotor at their front door, it might cure an ulcer or two
Re:But how do you explain/ (Score:3, Informative)
Windows Crash Vs. Linux Crash
Re:A model of closed source (Score:3, Informative)
Which one? I think there were several, though I don't recall any of them affecting me--they all seemed to be cause by obscure stuff or in experimental drivers. The one specific incident I remember was a problem with ext3 not writing all the data on umount. If you synced before unmounting, you didn't lose data. I know Slackware puts a sync in the shutdown script, so I bet most Slackware users running ext3 didn't see the problem except when manually umounting filesystems. Ext3 was rather new then (still is), and I elected not to use it. In fact, I still go with ext2.
The only problem I've ever had with ext2 was when I pulled out a floppy while it was writing. Hosed the disk pretty bad. I used minixfs for floppies from then on. I suppose it happened because ext2 is optimized for speed, not data recovery. If you want that, then go with FreeBSD's soft update and disable disk write caching.
Maybe I haven't experienced problems with Linux because I just haven't encountered the brunt of Linux bugs, or maybe it's because I stay away from most experimental code and new features. Though I don't think Linux has nearly as many problems as MS flunkies try to make it out. My primary reason for migrating from MS to Linux was all the stupid problems with MS software--especially their OS, and the fact Linux had almost no problems. No matter what I did, Windows would crash at least a couple times a day. Linux almost never crashes, and when it does, I have been able to trace it down to either a hardware problem or a massive misconfiguration on my part.
Re:Uhm, right... (Score:4, Informative)
Why do you think I have that sig? It's because everybody screws up occasionally. But since you don't want to play nice...(and you misspelled "wrong")
Your indentation is extremely misleading. Subsubsection (3) only applies if the requirements of subsection (a) are met.
Since the requirements of 20(a) must be met first, let's take a look at it by itself:
Wow, your options are:
The only way to ensure option 1 is to make sure nobody in the company calls you an engineer, so they won't slip up when talking to people outside the company. This is no different than not calling yourself an engineer at all.
Option 2 is worse than calling yourself something other than a software engineer, and a lot less reliable.
Now, you might say that software engineering doesn't fall under the "practice of engineering" bit.
*ahem*
Re:Interesting article (Score:2, Informative)
http://msdn.microsoft.com/chats/windows/windows
CmdrTaco = Sensationalism (Score:3, Informative)
"Microsoft Code at Fault for Half of all Windows Crashes"
I look at the paragraph under it:
"Microsoft is claiming that half of all MS Windows crashes are the fault of third party code, not their own."
Anybody older than the age of, say, 10 should see that these are two very different statements. To assume that Microsoft is automatically to blame for the other half of OS problems completely ignores what everybody here should know is the #1 source of computer problems: User error.
If you want to lament the lack of quality conrols involved in Microsoft's "Made for Windows" branding, fine. If you want to conjecture just what that other half really is, also fine. But you can't print painfully obvious logical fallacies like this and hope to be taken seriously as a source of news.