The trouble with using an empty development database

Imagine yourself building a new web application from the ground up. Let’s say – an online bookshelf where you want to keep track of all the books you own. Along with the first lines of code you setup your basic database schema (You really need a database, right?) and you even come up with some nice demo records for testing. You’re making great progress building features into your app, and the sun is shining brightly on your brand new Macbook Air (ok, if the sun shines so brightly you may have trouble seeing your screen – but, let’s ignore that now for the sake of the story). The web application framework you chose takes care of all the nasty details of dealing with the database and your application is a pleasure to use.
One day you decide to take it for a spin by loading up all the books collecting dust on your shelves. It takes a while to fill it up and somehow you can’t ignore the fact that it gets a little slower over time. After entering a couple hundred books (have you really read all of them?) you start wondering what’s wrong with your box as your app begins behaving strangely. It feels like its literally crawling between pages. It can’t be your code, right? You didn’t do anything wrong, did you?

It’s not your code’s fault

No, you didn’t do anything wrong in the code. But maybe you relied a bit too much on the framework to handle all the database queries for you. Obviously it doesn’t perform as expected with larger amounts of data. I’ve seen database queries take a minute or more against not too large data sets (a couple of thousands of records). That might be a little too slow for you and your users (and it certainly was for ours).

You’re caught in the same trap most of us have been in at least once in their career. Developing your entire application against a very small set of demo records will not give you early enough feedback about the bottlenecks. While there are lots of potential candidates for web application performance bottlenecks, the database is one of the more common culprits.

Reasons for poor database performance

Some reasons for poor database performance I’ve seen include

  • The framework produces very complex queries which cannot be easily handled by the database
  • Indexes are missing on the most often used and/or biggest tables
  • The framework loads too much data into memory (e.g. loops over all records in a table)
  • The data model is not supporting the way you need to retrieve the data

If some (or, dare I say, all) of the above issues hit just when you thought you were done coding, things get very ugly, very fast. As so often in real life, it’s better to face these issues early on in the process to be able to learn and adapt to Real World requirements.

How to deal with database performance issues

Following some simple steps will not take away the need to address the problems, but it will give you the chance to deal with them at more appropriate times

  • Determine what a realistic amount of data might be in production during requirements gathering
  • Automate the population of your development database with this amount of test data
  • Turn off caching in your database which “eases the pain” of development but blinds you to evolving performance issues. In MySQL issue SET GLOBAL query_cache_type = OFF; in the db or set query_cache_type = 0 in my.conf
  • Get to know your slowest queries by logging execution times, optimizing them to be below 0.1 sec for regularly used ones (how to do this will be covered in a following post). In MySQL using the Slow Query Log helps a lot. Enable it in your my.conf file or via the command line (see link below for complete details). Ruby on Rails prints the query execution times to the server console in development mode
  • Make sure you know which queries do not use indexes. In MySQL use the “log queries not using indexes” option for the Slow Query Log either in my.conf or via the command line
  • If you are going to replace an existing system, write the data migration or import scripts as early as possible. Then real production data can act as the realistic test data mentioned above

Avoid preliminary optimization

Having said all this, a word of warning! Before your application is actually used by real users, you hardly know how it will be used and how the data will grow. Even though you should keep an early eye on performance, don’t overdo it and start optimizing places you think might be important (but will never be used in Real Life). Monitoring database behavior will become second nature after releasing your application into the wild.

Helpful Links

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.