Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×

Comment Checksumming + sufficient redundancy (Score 1) 321

We wrote our own parallel filesystem to handle just that. It stores a checksum of the file in the metadata. We can (optionally) verify the checksum when a file is read, or run a weekly "scrubber" to detect errors.

We also have Reed-Solomon 6+3 redundancy, so fixing bitrot is usually pretty easy.

Comment Re:Been there. Done that. (Score 3, Insightful) 841

I made a $3 mistake on my income tax return (Scottrade updated my tax info *after* I'd sent mine in, but they didn't notify me).

The IRS apparently took that as an excuse to torment me for most of a year. I got audit for the above $3 claim, as well as for "falsely claiming that I was due a tax deduction for student loans" (I took some night classes at the local community college). Apparently that $3 claim was justification for a fishing expedition.

First time, I take an entire day off to redo my taxes, discover that I have made a $3 error, cut them a $3 check, and sent them the 1098-T from the college to prove that the other claim is false.

Couple months later, they send me the exact same form. I again take another day off to recompute my taxes (I was correct), and again send them the same 1098-T info that they requested.

Third time, I told that I will be taken to court because I haven't provided the proof required. I take yet *another* day off to go to the local IRS office in Nashville and sit down with a lady to explain that I've already sent the 1098-T form in.

She logs into her computer, turns it toward me, and starts hitting page-down. "We don't have any record that you sent it in." I see it flash by and tap on the screen. "Yes you did, it was just on your screen a second ago." She pages up and stares at it in silence for 2-3 minutes. "Well I just don't understand that."

Great. So now that the IRS knows I've sent it in, we can put this whole misunderstanding behind us, right? "I'm sorry, but there's nothing I can do to fix this". My choices were pay it off, send an appeal to the IRS, and hope that suddenly grow a brain after the **4th** time, or go to tax court, lose yet another day's salary, and hope the judge was smarter than the IRS. So I paid.

The IRS's excruciatingly, devastatingly, mind-numbing incompetence cost me roughly $1000 in lost salary for a $3 difference. And the whole collective IRS can go pleasure itself with a saguaro cactus.

Comment Scaling problems... (Score 1) 270

I manage a couple of petabytes worth of disks (consumer, not enterprise) for the HPC center at Vanderbilt University, and they get absolutely hammered by CMS-HI users 24/7/365. At scale, you will daily see problems that you would never even think of.

The firmware on consumer hard drives is often crap. Very few of them support TLER, we have ~400's drives (Seagates) that needed a firmware fix to prevent sudden death but the fix wouldn't work en bulk over the SAS controller so we had to yank/flash/replace/repeat, and drives will occasionally lock up hard and require a power-cycle.

Don't believe for a second that Linux doesn't need a defrag utility. We were mystified by a sudden influx of permanent drive *slot* failures. After *much* investigation, it turns out that our users were filling them 100% full, erasing 5%, refilling, erasing 5%, etc, until the average file (~100 MB) had thousands of extents. The vibration from the head frantically scanning the disk to read the file was enough to cause the SATA connector to destroy the connector on the backplane (Supermicro chassis, would *NOT* buy again, Chenbro is the way...) We wrote a simple defrag script that simply copied the worst files to a different location and then move them back.

RAID5 isn't nearly sufficient at this point because you will eventually have two or more simultaneous failures just due to the number of disks. We wrote our own filesystem to offer Reed-Solomon-6+3 redundancy.

I'd love to know if you guys have any similar "WTH" horror stories.

Comment That's interesting! (Score 1) 190

A couple of years back at one of the Supercomputing conferences (I think in Phoenix), Fermilab had a cloud chamber in their booth, and you simply *would* *not* believe the amount of ambient radiation passing you at all times. I can easily believe that altitude would have an effect.

Another interesting idea would be to do the same experiment by latitude. Does the Arctic Region Supercomputing Center have a higher rate than the Maui Supercomputing Center? What happens during an aurora?

Comment Re:Physics versus MBA (Score 2) 343

I have a MBA from a top 25 school, but I also have 4 years towards a Ph.D. in theoretical physics and 12 years experience in academic high-performance computing, so I hope I have street cred when I say this...

Saying you can get a "12 Hour MBA" is like saying you can get Ph.D in astrophysics by reading Carl Sagan's "Cosmos". It's Dunning-Kruger made manifest.

I found my MBA to be just as challenging as my physics degree. Strategy, game theory, operations, and economics aren't exactly power-puff courses. And there's a reason they hand out Nobel's in economics.

Don't confuse the body of knowledge with the person seeking it. There's a difference between someone who /has/ an MBA, and someone who /is/ an MBA. The latter are annoying, but they would be annoying no matter what degree they had.

Comment He's an idiot (Score 5, Insightful) 568

Bandwidth is a time sensitive commodity. It's going to be sending either a 0 or a 1 100% of the time. Instead of caps, they should think about allowing customers to volunteer to be throttled for a reduced fee.

It's similar to an airplane ticket, in that it's worth full price, right up until the point the gate is about to close, at which point they will take any price over the marginal cost of fuel. I know many people that would be happy to let "full price" guy go first if it saved them a few bucks.

Comment Tornado *resistant*... (Score 1) 189

The walls may help shield from debris in the event of a EF-1 to 3 (which granted is the vast majority of tornadoes). But there isn't much on this earth (above ground, anyway) that's going to survive a direct hit from an EF-5 tornado.

My dad saw the track left by one that hit in Alabama years ago. The thing sucked up everything, including grass, in a 1/2 mile wide path. The only thing left behind was orange clay. There wasn't a single intact structure left, not even foundations.

Closest thing humanity has to a EF-5 -proof structure is probably the pyramids in Giza, and I'm not sure about that either.

Comment And the huddled masses sayeth to Lord Gaben... (Score 2) 510

"Then shalt thou count to three, no more, no less. Three shall be the number thou shalt count, and the number of the counting shall be three. Four shalt thou not count, neither count thou two, excepting that thou then proceed to three."

I'm assuming Wednesday is the Steambox announcement. You guys *really* need something with with a "3" in it for a launch. I don't think "Half-Life: Source" is gonna cut it.

Comment Re:SSH? (Score 1) 607

AES was standardized in 2001, so it just barely makes it under the wire. 3DES and Diffie-Hellman are also good targets. Or it may be referring to a popular foreign/military cipher, like GOST, IDEA, etc.

Comment NSA did it... (Score 1) 607

Over the past few years I have read about mind-boggling exploits in protocols WEP, WPS, and now IPMI. I have always thought it was either "idiot programmer who doesn't understand security 101" or "NSA". I think it's fairly obvious that a number of these things probably are their doing. Wonder if they are legally liable for the cost imposed on others to fix/repair/restore?

Slashdot Top Deals

"If I do not want others to quote me, I do not speak." -- Phil Wayne

Working...