Furryscaly
Metrics
You’ve been given the basic tools for this already. Change management is tremendously advantageous in the release process; without it your success rate for releases will be drastically lower and you’ll undoubtedly have more unplanned work (“firefighting”) in production. Configuration management dictates the quality of your builds and software library which will directly impact the resolution process (remember, rebuilding always beats troubleshooting!). The final phase discussed in “Visible Ops” is continuous improvement and it’s basically about measuring and acting upon critical metrics gathered from your newly implemented change and configuration management processes. You cannot manage what you cannot measure, and I’m not talking about grepping through a log file here! You need to methodically plan which metrics will give you the information you need to improve upon your processes and ultimately deliver more business value per server than your competitors!
So what exactly should you measure? Again, the folks at the IT Process Institute give you a great head start. How effectively do you:
- Release: generate and provision infrastructure?
- Controls: make good change decisions that keep production infrastructure available, predictable and secure?
- Resolution: diagnose and resolve issues when things go wrong?
Carefully analyzing and presenting these metrics should easily give you the mandate to:
Release Metrics
- Time to provision known good builds: how long does it take to build and provision infrastructure from bare-metal?
- Number of turns to a known good build: how many times must the build be modified before its acceptable for deployment
- Shelf life of builds: how long will the build be in production until its replaced?
- Percent of systems that match known good builds: how many production systems can claim this?
- Percent of builds that have security sign-off: how serious is your organization about security?
- Number of fast-tracked builds: how many builds were rushed into production via the emergency change process?
- Ratio of release engineers to sysadmins: are you too busy doing stuff instead of thinking about it first?
Control Metrics
- # of actual changes per week : how many were authorized?
- # successful changes: what’s the ratio of “emergencies” to “special” to “business as usual”?
- # of service-affecting outages
- # of hours spent on change management
- # of changes submitted vs. changes actually reviewed
Resolution Metrics
- MTTR (Mean Time To Repair): average time to restore service after any interruptions
- MTBF (Mean Time Between Failures): the average time between service incidents
These numbers alone, however, won’t convince your boss to turn the company upside down. You’ll have to convince him through meaningful presentations demonstrating a clear business value in your revolutionary ideas. If this aspect of “visible operations” gives you any pause, then have a look at Matthias’s gentle guide.
I’ve certainly given you plenty of food for thought this past month and really can’t give enough praise to this 100 page volume of operational wisdom. If you’re looking for a way to get your company out of IT chaos, “The Visible Ops Handbook” will start you down the right path!
This is an exciting article. Currently I am working to establishing teams that design and adopt continuous improvement programs that focus on energy efficiency. One of the items that separate successful teams from the also ran are those that are able to identify and adopt metrics in the early stages of development. Without performance metrics the team loses focus and effectiveness.
In the manufacturing world the “Release” metrics would be known as takt time – the time it takes to produce a product to meet demand. “Resolution” would be response time metric in manufacturing – focusing on response time allows the organization to measure their commitment to making corrections. Where is their true passion?
Remember we measure what we treasure! If we make these measurements visible, it will get improved.
LikeLike
Here, here! The adoption of lean manufacturing ideas in software development processes is one of the biggest drivers of improvement – you can only achieve continuous flow after intense analysis of your production floor metrics.
Thanks for jumping in and sharing, Ed!
LikeLike