BREAKING: Caching Saves Server Lives!

I know, I know – last week, I promised a post about hooking up Xdebug to Eclipse along with some nifty profiling howto. But, unfortunately, real life managed to intervene again!

Every site has its peaks, and every knowledgeable sysadmin automatically keeps an eye on the server farm during this time. Let’s face it – the Internet is a flash mob waiting to happen to your website, not to mention downright dangerous to your servers’ health. One false misconfiguration, and KA-BLOOEY! Nagios and pingdom alert hell.

Mondays are our site’s peak day. Compared to our slowest day, Mondays send more than twice the visitors Monday compared to our slowest day. And, boy, last Monday was a doozy – record visits and record load. I’d prepared the week before by introducing a slew of MySQL database tweaks (many thanks to Peter, Vadim and their MySQL Performance Blog), and proudly watching my cache hits increase by as much as 5%. But Monday rolled around and showed me that I really hadn’t done a thing to address load. It had to be the application.

So, I started thinking like any good Java architect does – let’s start profiling the app looking for bottlenecks. The offshore team admitted they hadn’t done much in the way of code optimization and I thought there was bound to be a lot of low hanging fruit there. The servers were smoldering and response times were tanking. That’s why I started down the whole Xdebug path.

What’s Your Site’s Caching Strategy?

Matthias posed this innocent enough question to me over lunch on Tuesday after I’d convinced him I had a big problem on my hands. My reply was that we’re using some built-in Zend Cache. But this didn’t really seem to answer his question. And the more I thought about it, the more I started to understand the real problem. I didn’t know a thing about our caching mechanism. Was it disk or memory based? How was the cache invalidated? I started digging.

File Based Caching Works Fine

Especially if you don’t invalidate the cache every half-hour. My first “aha!” and a simple configuration tweak later – Presto! I had reduced the load from 8 to 4. Database queries were also halved:

MySQL Queries halved by quadrupling caching refresh rate
MySQL Queries halved by quadrupling caching refresh rate

Compromises Suck

While the rest of the team could appreciate my blood pressure falling in step with our server load, they weren’t exactly thrilled about having to wait half a day to see their updates on the website. It seems that operations victories, just like other victories in life, are short lived. It was time to find a permanent solution.

Memcached to the Rescue

I already had memcached in the back of my mind for over a year. In my last company there was some initial implementations and, of course, we’ve all heard about the obscene number of visits facebook is able to cope with. Well, they do this largely by relying on memcached and they have even donated their very own (highly tweakified) branch on github.

Next week, I promise I’ll show you how I got a development environment setup with memcached and how I deployed it to our testing environment. The initial results are promising (when are they not?) and I’m looking forward to sharing some numbers with you. In the meantime, check out this heart warming tale of how a web programmer and sysadmin got their website back under control by working together!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.