Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×

Comment Re:Perception of Necessity (Score 1) 265

Automating shit that can be automated so that you can actually do thing that benefit the business instead of simply maintaining the status-quo is not a bad thing. Doing automate-able drudge work when it could be automated is just stupid. Muppets who can click next through a Windows installer or run apt-get, etc. are a dime a dozen. IT staff who can get rid of that shit so they can actually help people get their own jobs done better are way more valuable.

The job of IT is to enable the business to continue to function and improve. Never forget that. People don't spend up big on computer stuff just because. They do it in order to save money by improving process. Improving process is where you should be focused, anything to do with general maintenance of the status quo is dead time.

Comment Re:Slashdot is a Bad Place to Ask This (Score 1) 265

Alternatively, perhaps somewhere up the chain they have no idea what can be done (this IT shit isn't their area of expertise), and are not being told by their IT department how to actually fix the problem properly. Rather, they are just applying band-aid after band-aid for breakage that happens.

It is my experience that if you outline the risks, the costs and the possible mitigation strategies to eliminate the risk, most sensible businesses are all ears. At the very least, if they don't agree on the spot, they are at least aware of what is possible and when the inevitable happens, be more keen to fix the problem next time.

Downtime cost adds up pretty fucking quickly. For example, my company: We have 650 PC users. pay rate probably ranges from 25 bucks an hour to 100 bucks an hour or more. Lets say the average is probably somewhere around 45 per hr.

1 hour of downtime, by 650 users, by 45 bucks per hour = $29,250 in lost productivity. Plus the embarrassment of not being able to deal with clients, etc. Plus potentially other flow on effects (e.g., in our case, possibly: maintenance scheduling for our mining equipment - trucks, drills, etc. didn't run. Plant therefore didn't get serviced properly, $500k engine dies).

If you fuck something up and are down for a day? Well... you can do the math.

Comment Re:Automate Out (Score 2) 265

This is why you move the fuck on and adapt. If your job is relying on stuff that can be done by a shell script, you need to up-skill and find another job. Because if you don't do it, someone like myself will.

And we'll be getting paid more due to being able to work at scale (same shit for 10 machines or 10,000 machines), doing less work and being much happier, doing it.

Comment Re:This is why you need.. (Score 1) 265

Yeah, don't get me wrong (i've been posting about setting up a test lab using vSphere, vFilters and vlans) - you can't replace the need to have someone on call or watching in case it all fucks up. But you can generally reduce the outage window and risk significantly by actually testing (both the roll out and roll back) first. And if you've got it to the point where you can reliably test, you can work on your automation scripts, test the shit out of them, and having been tested with a copy of live using a copy of live data, be reasonably confident that they will work.

If they don't? Snapshot the breakage, roll back to pre-fuckup, and examine at your leisure. Then re-schedule once you know wtf happened.

Comment Re:Offshore (Score 1) 265

Yup. Company I work currently has only a 4 hour window per day where we don't have active users actually on the clock. And if we win a job in say, south america (we're a mining company), that goes out the window entirely. VMotion, virtual networking, virtual filers/writable snapshots, are all beautiful things.

Comment Re:windows (Score 2) 265

OS choice is irrelevant. I've seen plenty of critical linux fuck ups in my day, and OS choice doesn't account for human error. And, being human, you WILL make human errors. You need a test environment and a backout plan. If you don't at least have a back-out plan and an estimate of how much the fuckup will cost BEFORE proceeding (and balancing that against the cost/risk of leaving it the fuck alone), you should not be carrying out the work.

Sure, that sounds like management speak, but seriously... cover your fucking ass. Because one day it will fuck up (whatever, the OS, this isn't just a Linux or Windows problem) and whilst the fuck up may not necessarily be your fault, the extended downtime because you have not tested and have no backout plan will be.

Comment Re: Murphy says no. (Score 4, Insightful) 265

This is why you build a test environment. VLANS, virtualization, SAN snapshots. There's no real excuse. Articulate the risks that a lack of a test environment entail to the business, and ask them if they want you doing shit without being able to test to see if it breaks things. Do some actual calculations on cost of system failure, and explain to them ways in which it can be mitigated. Putting your head in the sand and just breaking shit in live... well, that's one way to do it, but I fucking guarantee you: it WILL bit you in the ass, hard one day, whether it is automated or not. if you have a test environment, you can automate the shit out of your process, TEST it, and TEST a backout plan before going live.

Comment Re:Murphy says no. (Score 1) 265

Yup. Although, that said, if you have a proper test environment, like say, a snap-clone of your live environment and an isolated test VLAN, you can do significant testing on copies of live systems and be pretty confident it will work. You can figure out your back-out plan, which may be as simple as rolling back to a snapshot (or possibly not).

Way too many environments have no test environment, but these days with the mass deployment of FAS/SAN and virtualization, you owe it to your team to get that shit set up.

IT

Ask Slashdot: Unattended Maintenance Windows? 265

grahamsaa writes: Like many others in IT, I sometimes have to do server maintenance at unfortunate times. 6AM is the norm for us, but in some cases we're expected to do it as early as 2AM, which isn't exactly optimal. I understand that critical services can't be taken down during business hours, and most of our products are used 24 hours a day, but for some things it seems like it would be possible to automate maintenance (and downtime).

I have a maintenance window at about 5AM tomorrow. It's fairly simple — upgrade CentOS, remove a package, install a package, reboot. Downtime shouldn't be more than 5 minutes. While I don't think it would be wise to automate this window, I think with sufficient testing we might be able to automate future maintenance windows so I or someone else can sleep in. Aside from the benefit of getting a bit more sleep, automating this kind of thing means that it can be written, reviewed and tested well in advance. Of course, if something goes horribly wrong having a live body keeping watch is probably helpful. That said, we do have people on call 24/7 and they could probably respond capably in an emergency. Have any of you tried to do something like this? What's your experience been like?

Slashdot Top Deals

BLISS is ignorance.

Working...