Friday, 06 August, 2004

Fixing Bad URLs

I was exploring the Web traffic reports that come along with my Web hosting package from Sectorlink and ran across the "Bad Requests" report, which lists all of the bad URLs that have been used to access my site.  If you get a 404 error, this report will list it.  I found that a surprising number of requests for my Random Notes pages still use "/Diary" rather than "/diary".  As I pointed out on March 5, the move from my old hosting provider to Sectorlink put me on a Linux Web server on which case is significant in URLs.  When I realized that I converted all of my URLs to lower case.  I figured that after five months everybody who was linking to my diary would have fixed their links.  No such luck.  There were dozens of links to "/Diary" and a few other common casing errors as well (WinHelp and ToolUtil, for example).  I fixed the problem by creating Redirect commands in the site's .htaccess file.  Now any requests for "/Diary" will be redirected to "/diary", and the other common casing errors are mapped to the all-lowercase equivalents.

I also see a lot of requests for "robots.txt" and "favicon.ico".  I know that robots.txt is a file that well-mannered Web crawlers look for, although I'm not sure what it's supposed to contain.  It looks like the crawlers look for favicon.ico, too, because the number of requests for both files is very close to identical.  I guess I'll have to read up on what those files are for and decide if I need to include them on the site.