dnavid - Slashdot User

Comment Re:Reliable servers don't just crash (Score 1) 928

by dnavid on Friday October 31, 2014 @05:33PM (#48283529) Attached to: Ask Slashdot: Can You Say Something Nice About Systemd?

Actually that statement is 100% correct since the definition of a reliable server is one that does not crash.

Its trivially easy to make server software that doesn't crash. Just send all exceptions into an infinite loop. I think not crashing is a common prrequisite, but far from sufficient requirement for a reliable server. In fact, for some software like filesystems and databases crashing is almost not relevant to reliability: crashing to prevent data corruption is what reliable systems do in some contexts.

"Reliability" is when the software does what it is designed to do. That can include protective crash dumping. "Availability" is when the software is always running. Well designed and implemented software is both reliable and available, which is another way of saying its always running, and always running correctly.

Comment Re:Gay? (Score 1) 764

by dnavid on Thursday October 30, 2014 @06:31PM (#48274023) Attached to: Tim Cook: "I'm Proud To Be Gay"

I don't see why it should be a reason to be "proud". Gay is the way he is rather than something he has chosen but it does not confer some form of superiority on him.

I have no idea why people keep saying this, as if the only valid reason to express pride is to express feelings of superiority or requesting acknowledgement of accomplishment. The way I use the word, and the way the dictionary defines it, allows for pride to express positive feelings of association, and to express self-esteem particularly to contrast with the expectation of the opposite. For example, during and immediately following World War II, many Japanese Americans expressed pride in being Japanese Americans. Some served the military during the war and were proud of their accomplishments, but others did not and were only expressing pride in being associated with a demographic that was often denigrated but did noteworthy things. I don't recall hearing people ask why a Japanese American would express pride in being Japanese when that was not their choice: the reason for making the statement was obvious at the time, as is the reasons for declaring pride in being gay today. Except for people being deliberately obtuse.

Comment Re:Blades (Score 1) 56

by dnavid on Friday October 17, 2014 @05:26PM (#48172487) Attached to: Making Best Use of Data Center Space: Density Vs. Isolation

The SAN is usually less of a single point of failure because they usually have a lot of redundancy built-in, redundant storage processors, multiple backplanes, etc. You're right that off-site replication is still important, but usually more for whole site loss than storage loss.

People assume the biggest source of SAN failures is a hardware failure, and believe hardware redundancy makes SANs less likely to fail. In my experience, that's false. The biggest source of SAN failures are (usually human-) induced problems from the outside. Plug the wrong FC card with the wrong firmware, knock out the switching layer. Upgrade controller incorrectly, bring down SAN. Perform maintenance incorrectly, wipe the array. SANs go down all the time, and often for very difficult to predict reasons. I saw a SAN that no one had made any hardware or software changes to in months just suddenly crap out when the network that connected it to its replication partner began to experience flapping which noticeably affected no one except the SAN, which decided a world without reliable replication was not worth living in and committed suicide by blowing away half its LUNs.

Keep in mind the last time I saw a hard drive die and take out a RAID array was so long ago I can't remember. However, the last time I saw a RAID *controller* take out a RAID array - and blow the data away on the array - was only a couple of years ago. Its important to understand where the failure points in a system are, particularly when it comes to storage. These days, they often are not where most people are trained to look. Unless you are experienced with larger scale storage, you're not trained to look where the problems tend to be.

Storage fails. Any vendor that tells you they sell one of something that doesn't fail is lying through his teeth. Anything you have one of you will one day have to deal with having zero of. It doesn't matter how "reliable" the parts in the one thing are. You should plan accordingly.

Comment Re:You could lock down Windows (Score 1) 334

by dnavid on Wednesday September 17, 2014 @11:39PM (#47933943) Attached to: Ask Slashdot: Remote Support For Disconnected, Computer-Illiterate Relatives

For the purposes of the discussion, I'm assuming they are on Windows 7. If they aren't on Windows 7, they need to get there, at least. If they are still on XP that just sucks because a lot of the below stuff isn't there.

Something to look at which works for both Windows XP and Windows 7 are software restriction policies, which are a form of whitelisting build into Windows. With Windows 7 Enterprise or Ultimate editions, you can also use Applocker which is a more sophisticated version of software restriction policies. I'm not an expert on SRP or Applocker, but I believe both can be used to lock down a desktop and prevent users from running or somehow causing to run any executables except for the ones you whitelist. That won't prevent all possible malware from infecting the system, but between that and malwarebytes I think that would provide significant protection for this specific use case, and you wouldn't have to retrain the users to switch from Windows to a Linux desktop.

Comment Re:It's a production system (Score 1) 85

by dnavid on Tuesday September 16, 2014 @11:26PM (#47923861) Attached to: Why Is It Taking So Long To Secure Internet Routing?

The internet is in production. No one wants to touch anything that's already in production unless they literally can't make it any worse. Otherwise we would have IPv6 as well.

Lots of people want to touch production systems. In the case of the internet and BGP, however, evolution has weeded out the people who like to touch production systems, and the only people with administrative rights are still getting over having to support 32-bit AS numbers and wondering where their pet dinosaur went.

Comment Re:Again? (Score 1) 200

by dnavid on Monday September 15, 2014 @04:15PM (#47912285) Attached to: New Details About NSA's Exhaustive Search of Edward Snowden's Emails

Organized crime had "NDAs" as well. The agreement is worth the organization you're agreeing with.

The word of someone who believes they can break their word if the people they are giving it to is not worthy of it is completely valueless at all times.

Comment Re: Magic (Score 1) 370

by dnavid on Monday September 15, 2014 @04:06PM (#47912205) Attached to: The State of ZFS On Linux

I was just reading up on Ceph a bit. One thing that does have me concerned is that it does not appear to do any kind of content checksumming. Of course, if you store the underlying data on btrfs or zfs you'll benefit from that checksumming at the level of a single storage node. However, if for whatever reason one node in a cluster decides to store something different than all the other nodes in a cluster before handing it over to the filesystem, then you're going to have inconsistency.

The problem you're describing is a problem that neither ZFS nor BTRFS is capable of handling either. Both checksum data on disk, but are vulnerable to errors that occur anywhere else in the write path starting from network clients through the OS. That's why ECC or fault tolerant memory is explicitly recommended for ZFS enterprise servers; a bit flip in memory is impossible for ZFS to correct for or detect in most cases.

Comment Re: Magic (Score 1) 370

by dnavid on Friday September 12, 2014 @07:10PM (#47894193) Attached to: The State of ZFS On Linux

Sure, but even in a mirrored btrfs configuration you don't have to add drives in pairs. Btrfs doesn't do mirroring at the drive level - it does it at the chunk level. So, chunk A might be mirrored across drives 1 and 2, and chunk B might be mirrored across drives 2 and 3. For the most part you can add a single n GB drive at a time and expand your usable storage capacity by n/2 GB. You don't have to rebalance anything when you add a new drive - it will just be used for new chunks in that case. However, in most cases you'll want to force a rebalance.

That's a good thing/bad thing. Its what allows BTRFS to gain n+1 redundancy per data chunk on odd numbers of drives or in arrays where the number of drives changes, because mirroring isn't geometry-specific. But with disk mirrored vdevs you have the case where you can lose either half of the mirror for any vdev with no data loss. In other words, with four drives organized as two sets of mirrors, I can lose any two drives as long as they are not both members of the same mirror. With chunk-based mirroring once you lose a single drive you can't be certain the next drive failure anywhere won't fail the array without knowing exactly how the chunks are mirrored. That's not what people generally expect when they use "mirroring" and that mismatch can cause problems in maintaining arrays. Honestly, if ZFS could do that kind of mirroring with metaslabs, I would personally turn it off.

If I was going to look at more advanced device redundancy and management storage, I would probably jump past btrfs and go to Ceph. There I can not only add storage devices however I want, I can also scale up the number of storage servers any way I want. Its still a developing system, but then again so is btrfs.

Comment Re: Magic (Score 1) 370

by dnavid on Friday September 12, 2014 @05:30PM (#47893455) Attached to: The State of ZFS On Linux

Agree - I phrased my original question poorly. My point was that raidz was not as flexible as the roadmap raid5 support for btrfs (which behaves like raidz, not like raid5 in zfs). I'm interested in being able to add/remove individual drives to a parity array.

Although people ask for it often, I'm unaware of that feature being on anyone's implementation roadmap. Part of the problem I think is that there's a difference between supporting a feature, and having that feature be problematic to use. Its unclear to me how useful btrfs rebalance is in RAID5/6 arrays with large drives. It could impact performance enough to make it annoying to use in the general case. That's one of the reasons why even hardware RAID5/6 is not used as much in servers with high performance requirements. The cost to rebuild makes it prohibitive to recover from a drive failure.

Even for a home array, I'm more inclined to use mirroring than RAIDX, and ZFS does allow you to add mirror pairs to pools composed of mirrored vdevs. In other words, you can make a pool with four drives set up as two pairs of mirrored vdevs, and then later add drives in pairs of mirrors and dynamically resize the entire array across the new drives. That's not what you're looking for, but its what ZFS users typically do instead, which is why there is less pressure to add dynamically adding disks to RAIDZ vdevs not as high as you might expect.

Comment Re:Who would have thought (Score 1) 194

by dnavid on Friday September 12, 2014 @04:59PM (#47893255) Attached to: The Documents From Google's First DMV Test In Nevada

Not quite. Chernobyl was caused by operators not understanding the reactor's function in low power situations which happened to be during a test to see if the cooling system would work in the time after the core shut down but before the diesel generators were back up. They brought the power levels too low to where the reactor was nearly shut down, then to raise it back up they brought the control rods all the way out creating hot spots. Then when the power came on too strong, the response was to lower the control rods, which in a very hot reactor actually does the opposite- causes a huge reaction.

That's true in broad strokes, but you're overlooking key details. First, the automatic cooling systems were shut off. Second, when the power dropped to very low levels the operators instead of allowing the reactor to bring power up normally disabled the control rod systems designed to ensure the reactor couldn't "bounce critical" when they removed control rods to bring up power. And finally, when the test did not go as planned, and after several previous failures, the operators of the reactor went off-script and began overriding other safety features such as the coolant override systems in an attempt to drive the reactor to the desired test parameters.

You're not correct about the behavior of the control rods. The problem was a design issue of the control rods which caused them to displace coolant and neutron moderators as they are inserted, which means there's a short lag between when the control rods begin moderating the reaction and when they ironically increase the reaction due to reducing the effects of coolant moderation. This always happens, not just in "hot" reactors, but in a normally functioning reactor this is not a problem because the momentary increase in reaction is not significant. But all of the operators' previous actions caused the reactor to be placed into an extremely unstable situation. In particular, had they not disabled the automatic control rod safety systems the reactor would have automatically moderated them out of the unstable situation. But because they had essentially taken control away from the control rod systems for most of the control rods, the reactor could not effectively stop the problem the operators were introducing.

System logs show the operators did not simply try to lower the control rods back into the reactor to slow it down, they had attempted an emergency scram which drops all the control rods into the reactor at once. They would have only done that in response to an emergency condition. Which means when they did what you are pointing to as the cause of the accident, they were already in the middle of an accident. What they did not know was that they had already put the reactor into a condition beyond the ability of a scram to fix.

Comment Re: Magic (Score 1) 370

by dnavid on Thursday September 11, 2014 @05:40PM (#47884679) Attached to: The State of ZFS On Linux

Do you have a link. The last time I looked into this, you could not add a disk to a raid-z. You could add disks to a zpool, or add another raid-z to a zpool. However, a raid-z was basically immutable. This is in contrast to mdadm where you can add/remove individual disks from a raid5.

Google seems to suggest that this has not changed, however I'd certainly be interested in whether this is the case. The last time I chatted with somebody who was using ZFS in a big way they indicated that this was a limitation. He was using it for very large storage systems, and I could see how many of the ZFS features made it much more appropriate in these kinds of situations, especially with things like write intent log on seperate media, having many independent storage units which are individually redundant but otherwise behaving like a big array of disks (which helps to distribute IO which reduces some of the penalties with RAID), etc. I'm more familiar with btrfs and it seems to be evolving more towards being an ext4 replacement, where smaller arrays are the norm, etc. That isn't to say that many of the features on either aren't potentially useful for both.

MightyYar's process isn't adding a disk to a RAID-Z, its addressing your original question of how to replace 1TB drives with 3TB drives. His process uses an external USB drive to kickstart the process, adding a USB drive, telling ZFS to logically replace one of the older drives with the newer (bigger) USB drive, letting it rebuild with that USB drive, and when the old drive has been replaced in the array with the USB drive, removing that old drive and replacing it physically with a new 3TB drive, then asking the array to rebuild again. You don't even need the USB drive; you could replace the disks in the array, but unless you are at RAIDZ2 or higher, for a long time during the process you would not have drive redundancy. MightyYar's process prevents having a case where you are running without at least n+1 redundancy in the array. Once all the original drives are replaced physically with 3TB drives, you can ask ZFS to expand the array to use all the space.

Comment Re:Who would have thought (Score 4, Insightful) 194

by dnavid on Thursday September 11, 2014 @05:10PM (#47884383) Attached to: The Documents From Google's First DMV Test In Nevada

Ofcourse it is not 100% ready for the real world. It does not mean it should not be deployed though.

We need the power they said, it will be fine they said, don't worry they said.

The citizens of Chernobyl

Interesting analogy, since the Chernobyl accident was not caused by the power plant's automated systems, but by human beings that overrode the safety systems designed to prevent just such an accident. Interestingly, the Three Mile Island accident occurred for essentially the same reasons: humans prevented the automatic systems from functioning correctly to prevent an accident.

Comment Re:Is Coding Computer Science? Of Course! (Score 2) 546

by dnavid on Wednesday September 03, 2014 @04:04PM (#47819779) Attached to: Does Learning To Code Outweigh a Degree In Computer Science?

I'm assuming the vast majority of programming jobs require the ability to code, and no further domain specific knowledge. This is just based on my reading of many, many programming job listings over the years.

I'm sure there are jobs that require CS knowledge, just as I'm sure there are (programming-related) jobs that require Biology knowledge or Architecture knowledge or whatever. But all of those are niches: a very small subset of all programming jobs require those specific areas of knowledge. ALL programming jobs require coding though, and even among the ones that require domain-specific knoweldge, I'd imagine the bulk involve a lot more coding than anything else.

You don't need "domain specific knowledge" to code, but I think most such programmers are subpar. Code is like writing; you only need to know English (or your native language) to write, but if that's all you know then you're not going to be a particularly useful writer. Code implements algorithms, algorithms solve problems, and knowledge of the problem space is always not just valuable but the difference between uninteresting scribbles and a best selling novel.

A lot of the time, code supports other code; its code designed to address computer system issues explicitly. Knowledge of how computer systems work is essentially to being able to write or debug or sanity check reasonable code. Sometimes code directly tackles a non-computer problem like code to analyze data in another space. Its not *mandatory* to understand that space, but its extremely limiting on a developer to write code to analyze data about a subject matter they know nothing about. They will always need someone else to translate every little thing for them, and they will never be able to know if their code is actually doing something useful. If it goes awry, someone else will have to tell them that.

You have to be an extremely stellar programmer to be worth it, if you don't understand what you're coding about.

Comment Re:Every week there's a new explanation of the hia (Score 1) 465

by dnavid on Wednesday September 03, 2014 @03:54PM (#47819683) Attached to: Cause of Global Warming 'Hiatus' Found Deep In the Atlantic

The problem with climatologists is that they are climatologists; they are not sociologists, politicians, economists, or ethicists. Anybody who advocates following the advice of climatologists on climate change is either a charlatan or a liar.

The problem with sociologists, politicians, economists, or ethicists is that they know nothing about climate change. Therefore, anyone following *their* advice about climate change is an idiot. I guess we just throw darts at a board, because everyone qualified to know the subject matter of anything doesn't know how to use it, and everyone who knows how to use the subject matter knowledge doesn't possess any of it. Given a choice, I will go back to the world of lying charlatans, and you can go back to the world of living in a cave waiting for lightning to strike a tree and make the hot glowy.

Comment Re:Thought that was obvious... ? (Score 1) 141

by dnavid on Thursday August 28, 2014 @04:44PM (#47778683) Attached to: Underground Experiment Confirms Fusion Powers the Sun

Of course deductions carry scientific weight, but they don't serve as meaningful evidence and instead as the basis of a hypothesis.

Genuinely logical deductions carry equal or greater weight as experiments. Logical deduction is part of the process of scientific analysis. Logical deduction is in fact the glue that connects otherwise disconnected scientific theories. Without logical deduction, scientific theories would be disconnected semantic dust.

Without the rules of math and logic, you can't do scientific analysis. Experiments are the data, logic and math are the engine. Its logic that tells us if the Earth is spherical its not cubical. No one does experiments to prove the Earth is not every other possible shape. Its logic that tells us that if the Earth has one shape, it cannot have another. No experiments are necessary. No one has tried to experimentally confirm the Earth is not a dodecahedron, or a torus, because those are logical impossibilities.

You are probably confusing genuine deductive logic for what people sometimes call deductions but are actually inductions or "common sense." Those typically fail often. But they are not true logical deductions. "Holmesian deduction" is not generally real logical deduction. But when you say science uses experiments to support a conclusion, on what basis do you declare those experiments support anything? Why does seeing X support Y? Without logical deduction, you can't get from here to there. Experiments don't tell you that X supports Y, experiments generate the X. Logical deduction connects X to Y. It is in fact two logical deductions that underpin two of the foundations of Science. If an assertion always leads to X, and an experiment demonstrates that X is false, then the original assertion cannot be entirely true. That's the principle of falsification. Conversely, if an assertion predicts a set of circumstances S, and the set S is distinct from all other similar assertions, then if experiments confirm all the elements of S, the probability that the original assertion is true increases with the size of S. That's the principle of confirmation. Try and do Science without variations of those deductions.

Slashdot Top Deals