I know I might be going out on a limb here, but I recently got burnt pretty badly by _not_ having any test harness around my Apache configuration.
We maintain a database of some 15K rewrites and I accidentally blew this away. But it gets worse – we didn’t even notice for a whole week!
Of course our Google ranking tanked, we lost a lot of visits and I’m left scratching my head. How did this happen? And, more importantly, how can I make sure it never happens again?
As usual the root causes for such a catastrophe are manyfold. The environment was extremely fragile with zero checks in place to verify correct operations. First thing I did was setup some pingdom checks to alert us if anything is amiss with our rewrites. But a few checks performed by some (albeit very professional) external company isn’t enough to safefast a critical business asset.
So I decided to make a transaction safe wrapper script for modifying rewrites. After adding/modifying any data, there are quite a few manual steps involved in deploying new rules. Of course, our environment is load balanced which multiplies the chances of making manual mistakes.
The new script handles deploying and testing a subset of existing and new rules for all the given servers. If any test fails, the old rules are “rolled back” and retested to ensure minimal business impact.
What used to be a high risk maneuver is now fairly routine and much more transparent!
How do you handle business critical, Apache configuration changes?