Starting a job with a running system and real users is a nice “problem” to have but it presents some unique challenges as well. Especially if server monitoring isn’t robust and there are absolutely zero automated tests. Without these two critical components, you’re both operating and developing completely blind.
Without monitoring, server changes can’t be analyzed to see you’ve really made things better (or even worse). And without testing, every commit you make is a risk to the running site.
Monitoring made easy
Pingdom is perhaps the simplest monitoring tool that literally anyone with a browser can setup. Even if you don’t want to (or can’t) spend a penny, they will track one URL on your site for free. Be smart, and point this URL to a critical, complex page on your site to verify as many running pieces as possible. Once entered, Pingdom starts collecting data on the page’s general availability and even response time (world-wide).
With the single free URL check from Pingdom, you literally have zero excuses for flying blind. As outages crop up, get the URLs that demonstrate these failures added to Pingdom. Stop being the last guy to find out that the web service is down and start being the one reporting it’s outage to team.
Getting your SNMP configured correctly is the next step and will allow you to do real low-level monitoring of disks, cpu, network, etc. If you don’t have the time (or know-how) to setup a front-end to report on all these data points, think about having an external service provider do it for you. Logicmonitor and Cloudkick are both excellent and reliable monitoring services.
Testing is not so easy
Testing is never easy. If it was, everybody would do it! While preparing a complete test harness for unit testing might take a month or two, don’t forget about some simple acceptance tests.
Here we have two options:
Selenium is probably the easier of the two to get up and running quickly (and without a lot of technical knowledge). The Firefox plugin makes recording simple workflows a breeze. If multiple browser support is also something you need, take a look at Saucelabs. There was some talk early this year about them allowing unlimited, free test minutes using Linux as the OS.
Write a simple login/logout test. If your site has a regularly used forum, write a test that creates/deletes a new topic or post. Such simple tests quickly become the foundation of your future build server!
By implementing these two steps, monitoring and testing, you have started down the path of gaining control over your environment. What other simple steps have you taken to avoid unwanted text messages in the middle of the night?