Computer Voodoo? 686
jbeaupre asks: "A corollary to 'Any sufficiently advanced technology is indistinguishable from magic' is that sometimes users have to resort to what I call 'computer voodoo.' You don't know why it works, you barely care how it works, but you find yourself doing the strangest things because it just seems to work. I'm talking about things like: smacking a PC every 5 seconds for an hour to keep it from stalling on a hard drive reformat (with nary a problem after the reformat); or figuring out the only way to get a PC partially fried by lightning to recognize an ethernet card, after booting into Windows, is to start the computer by yanking the card out and shoving it back in (thereby starting the boot processes). What wacky stuff have you done that makes no obvious sense, but just works?"
hitting it (Score:5, Informative)
Re:hitting it (Score:5, Informative)
Re:Current computer (Score:4, Informative)
some BIOS require certain devices to be present to pass the POST. I discovered this the hard way when setting up a headless server. I spent 20 hours installing gentoo, got the services all nicely configured and put the machine in the corner, and it never went online... so I pulled it out and brought it back to my desk and it booted fine.
I didn't find out that it was keyboard/monitor missing errors that was preventing the system from booting until I carried the monitor to the other side of the room and plugged it in and saw the keyboard error... then I poked around in the bios and saw the options for requiring keyboard and monitor.
Re:floppy drive (Score:1, Informative)
Seen this one. Fixed it by clearing out the recent documents in the start menu.
Re:Always remember... (Score:4, Informative)
The POSIX semantic is: sync() doesn't have to actually write everything, it can just schedule the commit. However, a second sync() won't return until the writes from the previous sync() finish.
On Linux, a single sync is enough, though.
Re:Not sure how it works... (Score:5, Informative)
IDE drives keep a list of spare sectors to be used if one of the "primary" ones gets damaged. However, if a sector gets damaged and it already contained data, the drive won't reallocate it, because it would have no way of recovering the information. So it keeps "hoping" that some day the data will be readable again, and when that happens, it'll reallocate the sector. However, it never happens.
When you overwrite a defective sector, the drive says "aha! since the user overwrote the information, it means it's not important anymore; so I'll go ahead, mark the sector as bad and replace it with a spare". That's why overwriting gives the drive a chance to remap all bad sectors to clean ones.
This is a trick I learned by reading the documentation on smartd; if SMART reports defective or unreadable sectors, there's a way to figure out which files reside in those sectors and overwrite them with zeroes; the file will of course be lost, but by overwriting you let the drive reassign the sector and everything is peachy again.
By the way, if you reformat the drive with the destructive verification option (-c -c) it's likely that when the test overwrites to verify readability, the same reassigning process will take place; the standard "-c" test is a read-only test that's why you're unable to format a drive without the overwriting procedure.
So you see, not voodoo.
Re:Not sure how it works... (Score:5, Informative)
Mod parent up!!! (Score:5, Informative)
Again, great little one liner command to remember in the tool bag...
Re:Walk into the room (Score:4, Informative)
Computers mysteriously start working again when you enter the room? Feh. This hardly qualifies as Voodoo. I mean, it's got a perfectly rational explanation [catb.org]:
Re:Not sure how it works... (Score:1, Informative)
Re:Hard Drive Massage (Score:5, Informative)
Re:Memtest86 (Score:3, Informative)
More seriously it probably depends where in the address range the bad memory is, Linux happens to hit something critical and Windows doesn't -- or vice versa on some other machine. In my experience Linux itself is more tolerant of bad memory than some of the apps -- gcc in particular. I started overclocking one box and gcc began throwing all kinds of compile errors on code that compiled cleanly before. Turns out that's a known symptom of slightly flaky memory.
There's code in Linux to allow it to work with known-bad RAM, you just need to tell it the address range to avoid and the VM subsystem marks that as not for use, kind of like a disk drive mapping out a bad sector.
Most of it is Microsoft... (Score:3, Informative)
Back in the DOS days, people were convinced things worked better if they left the power off for long periods of time, before restarting.
Windows got more complex, and had too many of those things to name. Hitting the tower is a popular one. Moving the mouse around while waiting to prevent lock-ups is another very popular one. There are certainly millions of them. Linux, too, has developed a few, because some drivers are iffy, but they make up the tiniest fraction of what you see with Windows zombies (aka. users).
When I'm helping someone with a Windows system (I keep that as rare an event as possible), I still see similar nonsense. Windows XP's setup will allow me to partition hard drives #1 and #2, but WON'T let me format them there, and I have to put them in another system to do that part. Not to mention all the drivers that will just corrupt themselves after working fine for 3 months, if you just LOOK at the system funny. It's no wonder voodoo is so popular with Windows systems (and pre-OSX Macs, to be fair).
.
With that said, I have seen some frustrating hardware problems. After 6 months of working without any problem, my always-on Linux system starts crashing every day for 3 days, and then won't start up... Typical crappy power supply (bloated capacitor).
I had a Charter cable modem which would work whenever the tech guys were here (I called them out a dozen times over 2 months), but would fail miserably just moments after they'd step out the door. It took me a while before I realized that the thing would work for amout 5 minutes after it was power-cycled, and only then would it crash. They would never take my word for it, and I had to cancel my service to get rid of that piece of shit.
I've seen a few network cables, which test-out just fine, and work most of the time, but after the machine has been online for a while, it will fail, and need to be rebooted... This is partially Windows voodoo, because the stack is unstable, and can't handle many errors. But mainly, it's because of cables with marginal connections, which work about 95% of the time, enough to pass tests, but cause all sorts of problems in real-world use.
Then there are the occasional network cables with crosstalk, which can be hard to diagnose if you don't have an advanced/expensive meter, and give many of the same symptoms as above.
There was one case where a guy would play music CDs for an hour, before they started skipping. He changed CD-ROM after CD-ROM, before asking for my help. It was pretty obvious when I saw the sheer ammount of lint in his system fans. It would run fine while the system was cool, but the fans not spinning would drive the tempurature up to insane levels shortly, and the CD-ROM was just the first part to show symptoms.
Another Windows one is IE's download dialog... It takes so long before it appears, that when it starts there is already a few KBs downloaded, so it claims a 500KB/sec download rate on a dial-up modem, and only gradually goes down to about 4K, as it's really doing. People think that's accurate, and actually come up with the great idea of stopping and restarting downloads several times every minute, presumably because the server or their ISP will only allow them to download "fast" when the download first starts.
God I hate Microsoft...
Also (Score:3, Informative)
Re:Mod parent up!!! (Score:5, Informative)
This is one of the 'gotchas' with multimedia content. A hard drive may have fast access times and a fast bus, but if there are persistent CRC errors (and there is quite often CRC errors on a non-failing drive!), then the drive may have to take 15 or so separate reads of the track to reconstruct - It may also temporarily move the surrounding tracks to the secret area, then zero out the surrounding tracks in order to reduce track-to-track crosstalk.
All of this takes time, and quite often any real time media bandwidth budgets get blown when this happens.
The neat thing is, when this does happen, it is never an error. The program does finally get the data, but it just takes longer than expected. Typically one way to find out if the drive has remapped tracks on you is to have a program which measures track to track access time sequentially, and find the track boundaries that take a lot longer than a move from adjacent tracks should.
Jeff
Re:Not sure how it works... (Score:3, Informative)
Both. :-)
As far as I'm aware, all (E)IDE, SCSI and (S)ATA drives have spare blocks that are used for remapping blocks that fail whilst the drive is in service (incidentally, I suspect that SCSI drives have significantly more, which would go some way to explaining their lower capacity/platter ratio and higher MTBF ratings, beyond the pure mechanical stuff and individual rather than batch testing).
In addition to that, most filesystems include a way of marking blocks as bad so that new information will not be written there. I'm pretty sure DOS/Windows' scandisk does this if it can't read a block, and mke2fs and friends will make use of the badblocks list generated by any initial badblocks run (though I don't think there's any [easy] way of marking blocks as bad once the filesystem has been created with typical UNIX filesystems).
Re:Walk into the room (Score:2, Informative)