John Allspaw and Paul Hammond did a great presentation at Velocity 2009 about the tools and culture at Flickr, which enable them to do 10+ deploys per day.
My favorite quote is:
Ops’ job is NOT to keep the site stable and fast [but]
Ops’ job is it to enable the business (this is the dev’s job too)
The business requires change
They go on by presenting the dilemma of discouraging change in the interest of stability or allowing change to happen as often as it needs to. This is where they introduce their tools and culture for lowering the risk of change.
In this post I want to share with you how we use some of the tools John and Paul mention.
- Automated infrastructure: John and Paul say: “If there is only one thing you do…” and I couldn’t agree more. Using tools like puppet, chef or one of the many they name in their presentation to automate the live cycle of your servers is a must. I used Capistrano as a basis and built my own tool on top of it, Carpet. And I cannot imagine to install or configure my servers by hand anymore. It’s so much more robust and has so much less risk involved, I simply love it.
- Shared version control: We are using separate repositories for our code and our infrastructure recipes, but both are on the same github account and all team members have full access to both. As we’re only three tech guys, everyone of us does dev and ops and knows both worlds.
- One step build and deploy: Rake and Capistrano give us the power of building and deploying our whole app with a single command. This is so much better than deploying with rsync or manually copying things around. And it enables us to do continuous integration with Hudson, which is very nice, too.
As we’re such a small team, we’re not using any IRC or IM robots. I’m planning to introduce feature flags to be able to fine tune which features are available to whom. This would give us a little more flexibility in operating the platform. Shared metrics are already underway, we’re continuously enhancing our Nagios graphs adding technical and business metrics to them. Again something I would not like to work without anymore.
Enjoy the presentation…
… or watch the video:
6 thoughts on “Dev and Ops Cooperation”
Thanks for the links to the presentations and the videos. Very interesting material.
I think it is crucial for small and large development teams to really wake up and realize the wealth of tools that are available to help them automate their standard processes from build, to test, and deploy.
The tools available today can give even the smallest teams the freedom to focus on what is important. Automate everything that can be, so that your creativity and efforts are focused where they need to be.
I experienced devs + ops collaboration at ABN Amro and UOL. This is a must. Have a look at the article http://bit.ly/11d0in to see why.
John and Paul say: “If there is only one thing you do…” and I couldn’t agree more.
I fully agree. Yet you cannot make this happen without everyone being on board – ops, dev, business, and management.
We don’t have that agreement where I work and it is frustrating to read about things like this happening elsewhere and know I’ll never see it in the office.
Hi Brian, I can imagine how bad you feel, if not everyone is on board. Isn’t there a way to e.g try and learn a tool like puppet or chef at home and use it to automate one of your next install or config jobs (assuming that’s what you want to do)?
If you want, mail me at mm at this domain, to discuss your situation in more detail. Maybe we can find at least a first baby step together?
assuming that’s what you want to do)?
Done that, brother – I’ve got a puppet server up and it edits the banner page on a select set of servers. My sysadmin peers are on board with the broader idea, so is my manager.
What we lack is the time at work to make it happen for more hosts and in a broader context. Things work, they just are sub-optimal. And the guys in dev are busy doing their thing … which works, it’s just sub-optimal.
This isn’t a problem we can hack with technical items – it’s a business and organization problem.
Which is the most difficult of all to hack: no man pages to look at.
Definitely true that business problems are most difficult to fix. It might help to find and point out the business impact of the team wise sub-optimizations. Maybe you can try to draw a value stream map, to find out about the biggest trouble spot together with your manager and the other teams.