cduffy - Slashdot User

Comment Re:Take medicine away from the wizards (Score 1) 255

by cduffy on Monday February 17, 2014 @11:21PM (#46273095) Attached to: Apple Rumored To Be Exploring Medical Devices, Electric Cars To Reignite Growth

Comment Re:On Debian that's allready done. (Score 1) 223

by cduffy on Monday February 17, 2014 @07:49PM (#46271651) Attached to: Plan 9 From Bell Labs Operating System Now Available Under GPLv2

Comment Re:On Debian that's allready done. (Score 1) 223

by cduffy on Monday February 17, 2014 @07:38PM (#46271563) Attached to: Plan 9 From Bell Labs Operating System Now Available Under GPLv2

Comment Re:On Debian that's allready done. (Score 1) 223

by cduffy on Monday February 17, 2014 @07:30PM (#46271493) Attached to: Plan 9 From Bell Labs Operating System Now Available Under GPLv2

Comment Re:Take medicine away from the wizards (Score 3, Interesting) 255

by cduffy on Monday February 17, 2014 @06:08PM (#46270889) Attached to: Apple Rumored To Be Exploring Medical Devices, Electric Cars To Reignite Growth

Comment Re:On Debian that's allready done. (Score 1) 223

by cduffy on Monday February 17, 2014 @05:50PM (#46270721) Attached to: Plan 9 From Bell Labs Operating System Now Available Under GPLv2

Comment Re:On Debian that's allready done. (Score 1) 223

by cduffy on Monday February 17, 2014 @05:44PM (#46270675) Attached to: Plan 9 From Bell Labs Operating System Now Available Under GPLv2

Comment Re:On Debian that's allready done. (Score 1) 223

by cduffy on Monday February 17, 2014 @11:15AM (#46266761) Attached to: Plan 9 From Bell Labs Operating System Now Available Under GPLv2

Comment Re:On Debian that's allready done. (Score 1) 223

by cduffy on Monday February 17, 2014 @11:01AM (#46266625) Attached to: Plan 9 From Bell Labs Operating System Now Available Under GPLv2

If you're crashing on memory corruption, you're also serving garbage due to memory errors. Perhaps you should consider going to ECC if it's happening that often. If a DOS attack takes the daemon out, it's got bugs. It's understood that a DOS attack may cause it to not get to requests in a timely manner but it shouldn't actually crash. Bizarre race conditions? That's another word for bug.

Over here in the real world, saying "that's a bug in the code, so it's not my fault that it brought the cluster down" doesn't fly -- if you're ops, your job is to keep the cluster up in the face of badly-written software on individual nodes. Advocate for better design and development practices, absolutely, but that can't mean that we take our services down while we spend a decade rewriting every third-party component.

What happens when the same memory corruption and race conditions send the daemon chasing it's tail but not actually terminating on an error? There will be no SIGCHLD or any other signal.

So if we don't solve everything, we can't solve anything?

Ugly hacks for detecting and remediating that kind of bug exist. The slightly-less-awful ones tend to be runtime-aware (if you're running a model where requests have sole use of a thread that's handling them, for instance, it's able to have considerably less splash damage in terminating a long-running request), making them inappropriate for a one-size-fits-all situation.

If you really just need to restart on process exit, why not a while loop in a shell script? If you want to be notified, add a line to the script to fire off an email to the admin group.

Great. So now we have to write bespoke policy (via individually maintained scripts) for every service in the system, and modify each and every one of those scripts when we want to make a policy change.

Oh, wait, that's the status quo. And it's bloody awful.

As I said elsewhere, guano occurs so sometimes using a restarter as a stopgap makes sense. But that really should be considered an exceptional case, not normal policy and it should certainly be considered a dirty hack. I don't see it being common enough in good practice to build into pid 1.

It's been part of pid 1 for decades; see /etc/inittab.

Moreover, if it's *not* part of pid 1, it's easy to get into a state where your system isn't amenable to any kind of remediation: You have pid 1 but nothing else running? Sorry, only option is a power cycle.

Comment Re:Just like the singularity it seems that improve (Score 1) 191

by cduffy on Sunday February 16, 2014 @10:13PM (#46263391) Attached to: Elon Musk Says Larger Batteries Might Be On the Way

Comment Re:On Debian that's allready done. (Score 1) 223

by cduffy on Sunday February 16, 2014 @06:59PM (#46262327) Attached to: Plan 9 From Bell Labs Operating System Now Available Under GPLv2

Comment Re:On Debian that's allready done. (Score 1) 223

by cduffy on Sunday February 16, 2014 @06:52PM (#46262277) Attached to: Plan 9 From Bell Labs Operating System Now Available Under GPLv2

Comment Re:On Debian that's allready done. (Score 1) 223

by cduffy on Sunday February 16, 2014 @05:45PM (#46261767) Attached to: Plan 9 From Bell Labs Operating System Now Available Under GPLv2

Comment Re:On Debian that's allready done. (Score 1) 223

by cduffy on Sunday February 16, 2014 @04:29PM (#46261261) Attached to: Plan 9 From Bell Labs Operating System Now Available Under GPLv2

Comment Re:FAR better than fossil fuels, and even better t (Score 4, Insightful) 191

by cduffy on Sunday February 16, 2014 @02:30PM (#46260605) Attached to: Elon Musk Says Larger Batteries Might Be On the Way

Slashdot Top Deals