Pimp my Website: Streamlining Your Content

Hopefully, you have a better idea about how (un)performant and (un)optimized your website is after going through some of the tools I introduced you to last week. You’ve made the first step in getting to know more about how your site is really served to your users. Now, I’ll start walking you through some of the basic things you can do to start tuning up your website.

Ever caught yourself saying “yeah, I know the homepage is a bit heavy but the users only have to load it once – after that, it’ll be cached and much faster”. Well, according to Tenni Theurer, last year roughly half of the visitors that come through Yahoo! had an empty cache – meaning they have to download all the images and scripts on the page. This means that even if your page is viewed as often as Yahoo!’s, a lot of your users will be having to download all the content on a daily basis. Even if you’re lucky enough to earn more than one page view per day, you should definitely be streamlining your homepage for the long haul. After all, you want these users to come back, right?

Minimize HTTP Requests

The first step, according to Yahoo!’s “Best Practices for Speeding Up Your Web Site” is minimizing the number of HTTP requests made by the users’ browser to your web server(s). What we are basically trying to eliminate here is overhead. For every request and response (and all the packets in between) the TCP-IP stack adds a number of bytes which help make this protocol so reliable. As bytes add up (and browsers can only download so many files in parallel), we’re looking to save some overhead and make the request and response handshake as efficient as possible.

A few of the ways to do this include :

  • combining separate javascript files at deploy time: multiple files can be concatenated into one, saving request overhead
  • combining separate CSS files at deploy time: same idea as above
  • combining images into a single file: either in an image map or CSS sprite
  • Don’t know what a CSS sprite is? It’s actually quite “old school technology” used in the early 8-bit and 16-bit game development era. Basically, take all your image files, stores them in a single file (background-image) and reference their (x, y) grid coordinates with background-position for page placement. I spent some time overhauling a site which had over 60 images (the vast majority less than 100 bytes). By the time I was through, there were 7 images and site load times were improved by almost 33%. Watch out for single pixel images which are used in repeat-x or repeat-y styles. You will have to create 2 separate sprites (i.e. repeat-x_sprite.png and repeat-y_sprite.png) to hold these 1-pixel instances in order to be able to correctly use the CSS sprite technique.

    CSS sprites are indeed very cool. The heavy lifting is already entirely automated! The folks over at Website Performance host this great tool which takes a zipped archive of all the images on your page (up to 500KB) and actually generates a single image file with the corresponding CSS coordinate map. As you go through your original CSS file meticulously updating the relative px and em positions, you’ll find the emcalc tool indispensable.

    Multiple Domains

    Most modern browsers invoke parallel, concurrent requests to separate domains. This means if you have your website hosted at ‘www.mysite.net’ you could host your images under images.mysite.net and the browser would invoke simultaneous requests (and process parallel responses) for javascript, css and image files. Website Performance Optimization has a nice write up of this technique where they conclude that up to a 40% improvement in web page latency can be attained in many sites.

    But be careful not to overdo it – the sweet spot seems to be between 2 and 4 domains. Because of DNS lookups, you don’t want to delay your page load in case of some troublesome roundtrip times. For instance, if the user gets unlucky 4x120ms ~= 1/2 second of blocked time lost (where no other content is loaded or rendered).

    Caching AJAX requests

    We all know the phenomenon of Web 2.0 and its AJAX underpinnings. What was such a boon to users and usability has pushed alot of poor web servers way over their thresholds. Consider a simple AJAX chat client. How often should you poll the server to see if a message has been written (or a friend is online)? Such seemingly harmless questions can quickly overload and kill a website. Let’s say you have 100 chat users logged in to your site. You’ve decided to poll for new messages every 2 seconds and contact list updates every 10 seconds. In just one minute of elapsed time you’re users have unknowingly generated over 3000 clickless requests.

    If your server utilizes future Expire or Cache-Control headers for AJAX, and there hasn’t been any update in message or contact list status, these responses can be pulled from the browser’s local cache negating a round trip and greatly improving rendering speeds.

    Loading of Components

    Only load what you need when you need it. Sounds pretty straightforward but can cause some headaches to the unwary. Relevant stylesheets should be placed in the document HEAD using LINK tags to help make pages appear to be loading faster. Because the browser will start rendering these as soon as it has the data, items begin to popup for the user who’s brain begins to interact with your site (load times feel faster).

    Javascript should be placed as far to the bottom of the page as possible. This is because browsers stop all other requests and page rendering while parsing any scripts. While you may not be able to place all javascripts and their references at the end of the page, every one you do saves precious rendering time.

    Avoid redirects and 404s

    Redirects are often helpful for developers and sysadmins, but consume unnecessary time for the user. By default, neither 301 nor 302 statuses are cached by the browser unless Expires or Cache-Control are specifically added. A particularly wasteful redirect is when a trailing slash (/) is missing from a URL that should otherwise have one. For example, going to ‘http://www.webmd.com/a-to-z-guides’ results in two 301 responses; first to ‘http://www.webmd.com/a-to-z-guides/default.htm’ and then to ‘http://www.webmd.com/a-to-z-guides/common-topics/default.htm’! If you’re using Apache, you can fix this by setting up an Alias or mod_rewrite, or the DirectorySlash directive (for Apache handlers).

    HTTP requests are expensive enough without getting a useless response like “404 Not Found”. Particularly obnoxious is when an external JavaScript is taken down without warning. Not only will your browser block everything else while it tries to retrieve it, but it may even try to parse the 404 response as javascript, hoping against hope to find something usable. Yuck!

    I’ve given you just a taste of some ideas for streamlining your website’s content to delight both your users and your resident sysadmin. Next week, I want to give you some pointers on how to reduce the overall size of your web page to further increase load times and take a bite out of your monthly bandwidth bill. Subscribe to our feed to get it served up piping fresh in your favorite news reader.

    Leave a Reply

    Fill in your details below or click an icon to log in:

    WordPress.com Logo

    You are commenting using your WordPress.com account. Log Out /  Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out /  Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out /  Change )

    Connecting to %s

    This site uses Akismet to reduce spam. Learn how your comment data is processed.