Forgot your password?

Comment: Thanks for the feedback (OP response) (Score 2) 265

by grahamsaa (#47435115) Attached to: Ask Slashdot: Unattended Maintenance Windows?
Thanks for all of the feedback -- it's useful.

A couple clarifications: we do have redundant systems, on multiple physical machines with redundant power and network connections. If a VM (or even an entire hypervisor) dies, we're generally OK. Unfortunately, some things are very hard to make HA. If a primary database server needs to be rebooted, generally downtime is required. We do have a pretty good monitoring setup, and we also have support staff that work all shifts, so there's always someone around who could be tasked with 'call me if this breaks'. We also have a senior engineer on call at all times. Lately it's been pretty quiet because stuff mostly just works.

Basically, up to this point we haven't automated anything that will / could be done during a maintenance window that causes downtime on a public facing service, and I can understand the reasoning behind that, but we also have lab and QA environments that are getting closer to what we have in production. They're not quite there yet, but when we get there, automating something like this could be an interesting way to go. We're already starting to use Ansible, but that's not completely baked in yet and will probably take several months.

My interest in doing this is partly that sleep is nice, but really, if I'm doing maintenance at 5:30 AM for a window that has to be announced weeks ahead of time, I'm a single point of failure, and I don't really like that. Plus, considering the number of systems we have, the benefits of automating this particular scenario are significant. Proper testing is required, but proper testing (which can also be automated) can be used to ensure that our lab environments do actually match production (unit tests can be baked in). Initially it will take more time, but in the long run anything that can eliminate human error is good, particularly at odd hours.

Somewhat related, about a year ago, my cat redeployed a service. I was up for an early morning window and pre staged a few commands chained with &&'s, went downstairs to make coffee and came back to find that the work had been done. Too early. My cat was hanging out on the desk. The first key he hit was "enter" followed by a bunch of garbage, so my commands were faithfully executed. It didn't cause any serious trouble, but it could have under different circumstances. Anyway, thanks for the useful feedback :)

+ - Unattended maintenance windows

Submitted by grahamsaa
grahamsaa (1287732) writes "Like many others in IT, I sometimes have to do server maintenance at unfortunate times. 6AM is the norm for us, but in some cases we're expected to do it as early as 2AM, which isn't exactly optimal. I understand that critical services can't be taken down during business hours, and most of our products are used 24 hours a day, but for some things it seems like it would be possible to automate maintenance (and downtime).

I have a maintenance window at about 5AM tomorrow. It's fairly simple — upgrade CentOS, remove a package, install a package, reboot. Downtime shouldn't be more than 5 minutes. While I don't think it would be wise to automate this window, I think with sufficient testing we might be able to automate future maintenance windows so I or someone else can sleep in. Aside from the benefit of getting a bit more sleep, automating this kind of thing means that it can be written, reviewed and tested well in advance. Of course, if something goes horribly wrong having a live body keeping watch is probably helpful. That said, we do have people on call 24/7 and they could probably respond capably in an emergency. Have any of you tried to do something like this? What's your experience been like?"

Comment: Re:ZFS, Apple! (Score 1) 396

by grahamsaa (#47236271) Attached to: One Developer's Experience With Real Life Bitrot Under HFS+
Of course it doesn't, and I never said that. But your chances of data corruption if you use ZFS without ECC are somewhat greater, and potentially much more catastrophic. A web search for 'ZFS without ECC' will point you to a number of horror stores. Basically, ZFS always trusts what's in memory, so if what's in memory differs from what's on disk, the contents on disk get overwritten. If this discrepancy is due to bit rot, that's great -- you've just saved your data. But if it's due to a memory error, your system proactively corrupts your data. Considering that most non ECC DIMMs have a couple errors a year, you will very likely lose data if you run ZFS on a system without ECC.

Of course, ECC doesn't fix everything, but it should halt your system if your RAM has an uncorrectable error, which is better than corrupting your files on disk.

Comment: Re:ZFS, Apple! (Score 2) 396

by grahamsaa (#47236167) Attached to: One Developer's Experience With Real Life Bitrot Under HFS+
I'm not sure this is true. Other vendors like iXsystems already sell products that ship with ZFS. As I understand it, ZFS is BSD licensed. While Oracle distributes its own version of ZFS that may (or may not) include proprietary features, the open sourced version is freely distributable. The only reason it's packaged as a userland utility for Linux is that the BSD license isn't compatible with the kernel's GPL license. Apple's kernel is definitely not GPL, so this isn't a problem for them.

One problem might be that using ZFS without ECC memory can result in data loss, and ECC memory is more expensive (and not compatible with most consumer oriented processors that Intel makes). This would increase the cost of Apple hardware and could (possibly) be a hurdle, as Intel doesn't want to support ECC memory on their consumer oriented processors (as this could hurt sales of more expensive server-oriented processors. But Apple is a large enough vendor that they could probably negotiate something with Intel that could be workable.

That said, I don't know many Apple users that know what ZFS is, and it doesn't seem like there are many people clamoring for it. It would be a great addition to OSX though.

Comment: Re:Mountain out of a molehill (Score 1) 239

by grahamsaa (#46710867) Attached to: Heartbleed OpenSSL Vulnerability: A Technical Remediation
What if you work for an organization that has hundreds or thousands of users who connect to a SSL VPN? Re-issuing a single certificate isn't so bad, but re-issuing many certs (and working with end users to roll them out) sounds like a nightmare. Many businesses are also responsible for more than one website, and / or are heavily regulated. Just getting lots of users to change their passwords is bad enough, but if you have to tell them that their credit card number or medical information may have been compromised, possibly provide credit monitoring services for awhile, etc., is ABSOLUTELY a lot of work for a department or an organization.

Comment: Recouping the money is probably impossible (Score 2) 126

by grahamsaa (#46432633) Attached to: NASA Admits It Gave Jet Fuel Discounts To Google Execs' Company
But I'm much more interested in hearing about the rationale for offering this deal. Did NASA get anything in return? Did H2-11 request a subsidy? Was this a simple accounting error or due to corruption. The "what" here is far less interesting to me than the "why".

+ - Why is Slashdot ignoring the advice of so many developer articles. 2

Submitted by Anonymous Coward
An anonymous reader writes "Over the years, Slashdot has recycled plenty of articles about lousy UX, lousy design, lousy graceful degradation, lousy development practices, lousy community management, even lousy JavaScript implementations creating security problems. Did Slashdot read any of those articles?"

+ - Fuck beta 1

Submitted by Anonymous Coward
An anonymous reader writes "The beta is bad. It's so bad. The comments are reduced in screen width about 50%. Subject lines are deemphasized, scores are minimized, etc.

The discussions are the reason to come to Slashdot, and the beta trivializes them entirely. It looks like the comment section on a generic news site.

The comments now look like an afterthought, whereas they used to be the primary focus of the site."

The F-15 Eagle: If it's up, we'll shoot it down. If it's down, we'll blow it up. -- A McDonnel-Douglas ad from a few years ago