Anybody in operations that wants to gain more control and understanding of their environment has heard of the IT Infrastructure Library (ITIL). This set of concepts and techniques introduced by the UK’s Office of Government Commerce in 1980s heavily borrows from the ideas outlined in IBM’s “Yellow Books” by Edward A. Van Schaik (and later merged into the single volume “Management System for the Information Business”). While such works can be sorted out and implemented by a skilled IT department, what about the smaller companies that want to start off on the right foot and not wait until they begin to grow by leaps and bounds to implement senseful IT practices? Enter “The Visible Ops Handbook” written by the IT Process Institute.
The guys that wrote “Visible Ops” have been surveying hundreds of IT organizations since 2000. They found the following common traits of the most successful organizations:
Think of “Visible Ops” as an ‘ITIL for Dummies’ handbook which outlines some very concrete and straightforward (dare I say ‘agile’) steps to getting some much needed transparency in your daily web operations.
“Stabilize the Patient” and “Modify First Response”
The first step along the road to ITIL talks about how you should implement change management. Reduce the number of downtimes and unplanned work by clearly communicating scheduled maintenance windows and strictly prohibiting any changes to the systems outside of this timeframe. With this in place, you can then change how your organization typically responds to problems. You should now know exactly when the problems were introduced into the system. Simply going back and reviewing all the changes made in the last maintenance period should help kickstart your search in pinpointing the exact cause.
“Catch & Release” and “Find Fragile Artifacts”
Next, how often have you found yourself installing an OS onto a new server from a CD? Luke Kanies makes a harsh, but thought provoking, comment in an O’Reilly Sys Admin post Why Isn’t System Administration Evolving – “If your sysadmin is using CDs to build servers, s/he should be fired. No ifs, ands or buts. That hasn’t been the state of the art since before CDs were invented.” Virtualization and network installs have been around for decades now (and if you don’t believe it, check out the date on Popek and Robert P. Goldberg’s “Formal Requirements for Virtualizable Third Generation Architectures”). Its been proven again and again that it’s always faster (a.k.a. cheaper) to rebuild a faulty system than attempt to repair. With an up-to-date and well-maintained CMDB both initial installs and subsequent rebuilds can be achieved.
Create a Repeatable Build Library
After getting change under control, its time to get serious about Continuous Integration and regular releases. Using the artifacts you’ve discovered above, pull the whole system together in an automated build which should be regularly triggered by any development changes (typically done by monitoring version control repositories). Successful builds inspire confidence about the changes being made and enable more frequent (and higher quality) releases.
Enable Continual Improvement
In the first three phases, we’ve already gotten the basics down and can now control change, systems and releases. Congratulations, you’ve learned howto crawl! In order to start walking, go back and take a second look at each of these areas identifying the rough and sticky parts that still tend to trip you up. Establish some clear cut business goals in each of these areas (time between releases, critical failures and rollbacks, system availability, etc.) and then setup some metrics you can use to monitor them (hey, now you’re walking!). Once this is done, go back to phase one again and begin monthly reports and forecasts. Bring the rest of the company on board and show them the real, visible progress you’re making. If you’re able to do this, things are really in stride and you deserve to feel proud about the professional environment of which you’re now a part.
Each of these processes take more time than you’d expect to successfully implement. For a mid-sized organization (200+ people), you can expect about a year to get through them all. There is, of course, always resistance to fundamental changes and IT departments are notorious for their undying mantra “if it ain’t broke don’t fix it!”. The power of “Visible Ops” is proving to them that the current situation is broken and that their lives (and those of their customers) can be significantly changed for the better by starting down this path to redemption – true systems management and agile delivery of business value.
Stay tuned for the next four posts where I’ll cover each of the above steps in greater depth giving my own real-world experiences of walking down this path!