Follow Slashdot blog updates by subscribing to our blog RSS feed


Forgot your password?
DEAL: For $25 - Add A Second Phone Number To Your Smartphone for life! Use promo code SLASHDOT25. Also, Slashdot's Facebook page has a chat bot now. Message it for stories and more. Check out the new SourceForge HTML5 internet speed test! ×

Comment Re:Why? (Score 1) 297

Raid is toast. I dont care WHAT raid you are running, none of them can withstand a loss of 50% of the drives.

Really? I used to do that as a routine acceptance test for clusters. The only times it failed for real was when we'd screwed up something.

For that to work, you have to rigorously separate RAID mirrors into their own trays so that a whole tray failure (or cable, as you said) only takes one mirror down. For something like 10, 50, 60 you just make sure all of one side is on one array and all of the other on another (or if you have more than 2 arrays, that you separate them out into pairs with one used for one side and one for another).

Physical separation helps as well, so that you don't accidentally unplug A while starting servicing on B. That exact scenario is one of the canonical HA oopses.

Comment Re:Why? (Score 1) 297


I can't stress that enough. software and semi-software raid is a joke.

Not until the hardware fails and you need the data that was on there but not on the backup (or realized the backup failed a long time ago...).

For performance, yes, hardware is fastest. For reliability though, software RAID is better (hardware RAID can have interesting firmware version issues).

Old SAN / Cluster folks believe in belt+suspenders. I.e., often, use both.

Use Software RAID 1 across a couple of LUNs (or separate controllers / drive array stacks, for non-SAN environments). Build the LUNs with internal RAID (5, 6, hot spares, figure out your rebuild times, etc.)

Also - hugely common failure is that the operators aren't properly monitoring the underlying hardware RAID drive status. You need to know immediately when a drive fails even if there's RAID6 and a couple of hot spares in the array. When I worked for a VAR on clusters, I can't count the number of times I arrived and found that they'd had 2, 3, 4 failures nobody noticed, and were one more failure away from catastrophic data loss...

Comment Re:Why? (Score 2) 297

There is a very slight bathtub type curve - all numbers rounded, it's about 3% AFR in the first quarter (i.e. about 0.75% failures in first quarter) and 2% for drives in the 3-12 month range (i.e. about 1.5%). If I read the statistics presentation there right 33% of first year failures look to happen in the first quarter, which is detectable but minor initial higher rate. That's dwarfed by 1-2 year AFR (about 8%) and 2-3 year AFR (about 9%), but drops slightly after that.

They presented the AFRs rather than the culminative losses in an initial cohort per quarter/year, which would be slightly clarifying, but whichever way they did the analysis it's about like that.

Comment Re:Why? (Score 1) 297

I have worked for an OEM who installed about 30,000 drives a year; for end users with 10,000 drive environments, built out new 1,000 HDD and 600 SSD environments in the last year. I know all about static, having had the manufacturer-level training on how not to zap.

It's not just static. Some drives come with SMART errors (or bad blocks that matter), despite $MFGR assurances. Some of the failures develop in the factory and get shipped anyways as unlikely to get worse, some develop while being packaged or shipped or unpackaged. Run SMART data collection across hundred-drive collections (or thousands or more) and you get a lot of useful and scary info.

Also, there are well documented runs of drives - specific models, time ranges, factories involved etc - which all just blew up. Also happens to chips sometimes - I've been seriously bit by bad CPUs by Sun and Intel, support chips from several vendors. Also RAM going bad.

One prototype CPU literally melted the system down, all the plastic nearby inside the casing melted and puddled on the bottom of the case, the CPU label plastic was carbonized.

Comment Re:*SMOOTCH!* Buh-bye Enterprise! (Score 2) 165

Doubling lifespan that way requires that you only use half the disk capacity.

I have burned out a Major Name Brand SLC SSD with a high traffic OLTP DB in eight months. I have heard the same from Large Internet Companies which tested these for internal use. There are ongoing independent reliability expert studies in FAST, HOTDEP, other conferences which are uniformly highly skeptical of vendors' claims on SSD lifetime.

If you have not actually tested the drive out to six years service, run an accellerated pilot test unit out ahead of your main prod usage, to give you the canary warning.

Submission + - NetFlix outage

An anonymous reader writes: As of a bit after 7 pm EDT, the NetFlix site started to experience problems, going from being completely unreachable to intermittent responses, and back down to being unreachable. Given the outage pattern, it is likely that an outage on a limited number of servers caused a cascading outage when the remainder of the servers could not handle the combined load. No information seems to be available at this point concerning the expected duration of the outage.

Comment Re:The tried & trusted will still rule the ser (Score 1) 237

I've tried to do large database server farm tests on modern enterprise SSDs with TRIM, the best wear load leveling, SLC, etc. They go "poof" at moderate (few months, for my loads) lifetimes.

IOPS x Lifetime / price is a metric I find useful. Unfortunately, it makes SSD look even worse than it does just on a price basis 8-(

Comment Re:The tried & trusted will still rule the ser (Score 1) 237

Not really improved. I burned out a REALLY GOOD (best available) SLC SSD in 7 months with a mirrored production workload at a previous jobsite not that long ago.

Poof. All gone.

At the FAST conference, was yet another presentation on SSD lifetime burnout mechanisms, news not actually improving in the slightest so far on life. SLC is not good enough; MLC is toast in write-intensive apps.

Phase-change memory or one of the others, with millions of write cycles per bit, may pull this out, but Flash is not proving good enough for enterprises.

The Internet

Submission + - Last free IPv4 blocks allocation in progress (

georgewilliamherbert writes: IANA has announced that the last two unrestricted IPv4 /8 network blocks were allocated today to APNIC. By preexisting agreement, to avoid timing concerns from putting any regional IP number registry at a relative disadvantage, the remaining 5 /8 blocks are now to be allocated immediately to the 5 RIRs, which will presumably happen very soon.

Though one can semantically argue whether the final 5 allocation or the last 2 free blocks represent the actual end of IANA's IPv4 allocation, today was a major milestone in the end of new IPv4 use and coming IPv6 future.

Comment Re:Recovery Fairy Tales again (Score 1) 274

The Great Zero Challenge rules specifically exclude disassembly of the drive; all the bit-recovery mechanisms discussed in the literature require you to disassemble the drive and use custom heads to scan the surface magnetism map.

I.e., the contest is totally missing the point on what data recovery pros (i.e., the NSA and so forth) said they'd do if they had to scan disks to recover overwritten data.

It's hard to think of a less useful contest.

Comment Re:Looks like a clone of the Northrop YF-23 (Score 1) 613

Oh? A plane with a single fuselage, fuselage front engine intakes, canards, a delta wing, resembles an aircraft with separate engine pods on a flat center section, underwing engine intakes, a V-tail?

There's nothing configurationally similar between those aircraft. Nothing.

There's a passing similarity with the FB-22 bomber proposal, but that didn't have canards, just a delta, and was never more than a paper proposal (no detailed design or prototype).

Comment Re:Quality control? (Score 1) 332

The technology they used to get to space was 90+% Russian

Common fallacy - they bought a Soyuz and a lot of engineering time, and the vehicles are similar in configuration and concept, but the Chinese vehicles are essentially a whole new design and used nearly no Soyuz components other than the docking mechanism and imported space suits (I think that was it).

Looks similar doesn't mean design stolen from. Chinese engineers did most of the hard work on all of the hardware with those two noted exceptions.

The launch vehicle was all theirs.

Comment Re:What's the adage? (Score 1) 332

There are plenty of tax havens to go off to and live in, if you feel that way.

Problem is, none of them are a large, expanding, dynamic economy.

They exist for a reason, but modern economics does as well - it works, and it wins out over time at producing the most benefit for the most people (including the rich, who at times object to how it works, but who are far far FAR richer in the west than elsewhere...).

The current system is not entirely fair or reasonable by any one group's definitions of those terms, and certainly sucks in many ways. Welcome to the Real World. It sucks, but obviously less so than any other ideas we've tried so far. See similar observations about western democracy as a government model.

When you have a model that you can adequately explain and defend as holistically better, you'll get converts. I have yet to see any critic who can explain an alternate model in detail, because most of the critics don't understand economies well enough to design and engineer one. So give it your best shot. Perhaps you have the cojones than all the professional issue radicals and far-stream economics professionals lack, new ideas and the brains to link them into a system and the communications skills to explain it. Go for it!

But not on /.

Slashdot Top Deals

New York... when civilization falls apart, remember, we were way ahead of you. - David Letterman