Friday, 04 January, 2002

Saving Bandwidth by Compressing HTML

I've been wondering recently why HTML traffic on the web isn't transmitted in a compressed form.  It should be an easy matter, I thought, to either store HTML in compressed form on the server, or compress it on the fly during transmission.  The client browser could then decompress the stream as it's received, and render it.  This may have been unthinkable in 1994 when a 90 MHz Pentium with 16 megabytes of RAM was top of the line hardware.  But today you can buy a 1 GHz machine with 128 megabytes of RAM for $500.  Hardware isn't the issue.

As it turns out, I'm not the first one to come up with the idea.  The HTTP 1.1 specification includes compression (either gzip or Unix compress format), and most browsers have supported the compressed formats since 1998.  So what's the problem?  Servers.  Neither Apache nor Microsoft's IIS support on-the-fly compression of content.  According to Peter Cranstone's article HTTP Compression speeds up the Web,  the potential bandwidth savings by using HTTP compression is 30%!  (That's typical when you factor in graphics and other formats.  Pure text performance would be more like 50 to 75% savings).  It's criminal that servers don't currently support this compression.

Things may be changing.  The article mentions an Apache mod (mod_gzip) that performs the compression.  IIS 5.0 also supports some compression, as discussed inthis article from Microsoft TechNet.

In some ways it's unfortunate that browsers already support HTTP 1.1 compression.  There are much better and more efficient compression methods than gzip, but we're pretty much stuck with that unless we come up with a new standard.  Either that, or come up with a thin client of some kind that can decompress any new format before it gets to the browser.  That would be browser specific, though, with all the associated problems.  But even gzip compression is better than no compression.  If the server that hosts my web site did compression, then this web page would take 20 rather than 60 seconds to download on a 28.8 K bps modem.

Yes, I know many of you have fat pipes and couldn't care less about people with modems.  But before you scoff at the idea of using compression, realize three things.

  1. You are in the minority.  Broadband just hasn't taken off like I and many other people thought it would.  (Ask me sometime what my @Home stock is worth.)
  2. A connection is only as fast as its weakest link.  All those people trying to download big pages with slow modems are hurting web server performance.  If the servers supported compression, those people would get their pages faster and response would improve for all users.
  3. You may not notice the difference if text gets there 30% faster, but large web site operators and small companies would certainly notice if their bandwidth requirements were cut by a third.