haxin - Slashdot User

Comment A Necessary Evil (Score 1) 294

by haxin on Monday April 21, 2014 @04:14PM (#46808883) Attached to: Ask Slashdot: System Administrator Vs Change Advisory Board

Take everything I say with a grain of salt: I'm not in management and don't have 20 years of system engineer or system administrator experience. We recently implemented a change advisory board and while it's not perfect, it seems to meet our needs without requiring too much. While I haven't read every comment here, many are filled with cynical comments but no matter how cynical you become, it's never enough to keep up. But there are also loads of very helpful and useful comments too. It’s been a good couple of hours well spent so far. There was a time when we shot from the hip. A change would be made that would ultimately affect dozens/hundreds of users resulting in loads of calls to the help desk. At some point management would be alerted to the ‘trend’ in all the calls that would result in an investigation which often led to "Oh yeah, this 'tiny' change was made an hour ago." Now that the [potential] source was identified, the work was double checked by the responsible parties, often with a few managers standing nearby, until the problem was found & corrected or the change reverted. There was a lot of foot shooting going on. We’re not idiots, but we’re not perfect either which means that sometimes mistakes happen. And occasionally, even after having done all the research, risk & impact assessments, unexpected complications would arise. I'll admit, there was something nice about operating autonomously, without being micromanaged, scrutinized and often provided anything but constructive criticism; And it was great not having to deal with the bureaucratic red-tape one often has to go through to get a simple a change done. But as someone else pointed out, the catalyst that brought about this change was the perceived perception of an unstable system due to ‘lower than acceptable’ success rates when changes were made. When we adopted some form of change control, which later morphed into a change advisory board, trips to the ER for bullet wounds in the foot dropped dramatically. And when something did go wrong, we weren't fearful for having made an ‘unauthorized’ change. I don’t think I’m one to resist change. More often than not, I'm the one trying to drive a change and am rarely affected by someone else's change. And when I am, it usually doesn't require a massive cultural, routine, behavioral etc. change on my part. So when it came time to implementing some form of change control, I could understand how it was beneficial and why it was necessary. I’ll admit, it wasn't easy and required some getting used to, but I have an appreciation for it does for us. But IMHO, it sounds like, for many, the real crux of the issue is *how* a CAB is implemented. I realize every organization is different, but it goes a little something like this on this side of the fence: - Create your change request, which amounts to filling out an online form including things like who is doing the work, why are we doing it, how this affects our users, what’s the procedure to make the changes, what’s the testing process, what’s the back out plan etc.. You’re encouraged to include as much detail here as possible. Strongly. Encouraged. - Then you have to ‘socialize’ the changes with the [affected] departments/department heads. This is kind of a gray “wild card” area as it could be a number of individuals, and you could potentially find yourself repeating the same thing multiple times a day over several days. As such, I suggest holding a regular meeting a day or two before the cab, invite ‘the powers that be’ to go over your proposed changes. The ‘socialization’ step is arguably the most important one because if questions come up in the CAB, or if just one person isn't comfortable, it almost guarantees it’ll be denied until you work it out. Because of that, I personally think this is absurd and loathe the process, but I obey. - Finally on CAB day it should be a slam dunk because the ‘important folks’ are already aware and any concerns should have already been addressed. So you present, briefly explain what’s happening and done. We don’t have many *n?x systems here, so let’s talk about patching Windows: We have well over thousand workstations and a few hundred servers both physical & virtual across various flavors (every version of Server from 2003 to 2012 R2; Windows XP & 7). If I had to submit a lengthy change request for each patch that was going out, which also included a 1-page report of the patch details (file dates & versions, what is/isn't affected, severity etc), I’d have to find a new job. As much as I’d like to employ some passive aggressive, ‘stick it to the man’ tactics for presenting in our CAB meetings in an attempt to make it difficult for them and convey the idea that this is just stupid and a complete waste of [my] time, they’d probably catch on and show me the door after a few meetings. Fortunately in our environment it’s a single change request for any number of patches be it 1 or 1000. And our ‘security team’ are the ones responsible for determining which patches we do push. They do the legwork (find what needs patching: Windows & third-party apps), fill out the documentation (which amounts to a table in a Word or Excel document containing: KB, MSRC, Affected Products, Severity etc.; they copy & paste literally from the Microsoft site and include links to CVE pages for third-party apps like Reader, Flash etc.) and they create the change request. I handle the rest which is deployment of said patches via WSUS or some software delivery method. Usually, when it’s time to present before the CAB, there isn’t too much fuss because of the well-documented patch process we follow. Workstation patch process: - Week 0 Patch Tuesday: Initial hard & fast testing of patches to make sure it doesn't immediately break something. I prefer to wait a week just in case MS sends regression patches that same week. (I think something like this happened in October or November of 2013) - Week 0 A day or two later after Patch Tuesday: A thorough regression testing process where we test just about every application and perform a variety of tasks within each application to make sure everything still works and we haven’t lost any functionality. This is time consuming but is performed by essentially a dedicated resources in the evenings or on the weekend. If an issue comes up, they report it and we discuss. - Week 1: Patches are pushed to IT and a group of pilot users because presumably, its business as usual. - Week 2: Patches are deployed to a handful of offices, usually smaller offices, to avoid mass hysteria in the event we missed something in testing. - Week 3: Patches are deployed to the rest of the organization All in all, it takes about 4 weeks to get it all done, but it helps to give certain individuals the warm & fuzzy they need and makes IT look like stars to the user community because very little, if nothing, breaks. Servers are handled slightly differently. We first start by patching non-essential, non-user impacting servers during the week (business hours M-F), and gradually move up to the mission critical servers later in the month. After each patch window, we test the affected services to verify the system is functioning normally. Most of this is automated, but some are done manually. We also have monitoring solutions similar to Icinga to verify that services are up so that helps reduce testing time. There are of course exceptions, like heartbleed and other critical zero-days. Those are obviously not the norm but we treat them as incidents, where we apply the fix, usually after sending an emergency email notification. We document what was done after the fact and discuss it in the next CAB just so everyone’s aware. It’s all about Consequence x Threat x Vulnerability = Risk; or whatever math you use. Our current CAB process isn't perfect and I do have some gripes about it: (a) moving forward with the proposed changes *requires* a unanimous decision. A single [misinformed] person, or someone who simply doesn't like or feel comfortable with the change, can potentially create a 3+ day delay or halt the entire process indefinitely. We recently had a small outbreak of some ransomware because shares were not properly secured. The answer was to fix the NTFS permissions then grant, as needed - by request - write access to the specific location(s) in question. Well, one person made such a big deal over this that the proposed changes of fixing the shares were denied. Not too long after that we discovered the same shares, and them some, had been hit again. Unfortunately we caught it too late which meant backups had bad data resulting on data loss. Instead of saying "see, I told you so" we took the high road and proposed the same changes again without throwing anyone under the bus. We laid out how we could fix it for the right people, but again the same person made a huge stink of it making the fix impossible. One person shouldn't be able to stop the whole process, especially when we've done it their way and it left us vulnerable. (b) I'm asked to put an appreciable amount of substantive information in the form, but no one is really paying any attention to that. It's not a good use of time to copy/paste existing documented processes if no one is reading it, or paste code if no one is verifying/testing it. We should be able to summarize the changes and reference our established processes and procedures. If there aren't enough details, you risk having it temporarily suspended or just get a bunch of nasty looks and comments. Perhaps you could consider suggesting a similar implementation. I recommend drafting something in Visio that maps out the patching process, complete with swim lanes for responsible parties (e.g.: What do the engineers do, what the security guys do, what the cab does etc.) and an Excel document/Access database with relevant patch info . If all goes well after a handful of weeks/months, suggest making patches a ‘pre-approved’ change that doesn't require extensive scrutiny. You should still document the patches, but it’s understood that a specific agreed-upon process will be followed requiring less information in the change request, and hopefully significantly less paperwork and time. Because this sounds like it’s a new thing for your organization, it’s going to be a living process that will evolve over time as they work out the bugs. If the requirement or expectation is for you to hand them a ream of paper every time, comply, but maybe work to improve the process. Consider showing them that the 2 hour CAB meetings could be condensed to 30 minutes or less with your improved process, without sacrificing anything. This all about enlarging the pie versus increasing your slice or decreasing theirs. Keep the appropriate attitude ("we're on the same team and we want the same thing: patched systems in a timely manner with minimal downtime.") and continue doing a stellar job!

Slashdot Top Deals