Thursday, 16 December, 2004

RSS timestamps are ambiguous

I've slowly been working on the RSS article categorization program that I outlined back in August. In the process, I've come to the conclusion that a standard timestamp as expressed in most programming languages (which is just a date and time) isn't enough information to store about an article because it doesn't provide any context. First, some background.

When you publish an RSS article you can supply the publication date and time. The timestamp is expressed using the RFC822 date/time format, which takes this form:

Thu, 16 Dec 2004 15:50:03 -0600

The "-0600" is the time offset from Universal Coordinated Time (UTC)--what used to be called Greenwich Mean Time (GMT).   The offset is the difference between UTC and the local time, expressed in hours and minutes. Thus, -0600 means that the time here is six hours and zero minutes earlier than UTC. So UTC is 21:50, or 10 minutes to 10 in the evening. Every RSS program that I've seen strips the offset information after possibly converting the date/time. That leaves you with at least four possible interpretations of the timestamp that the RSS aggregator displays. The timestamp can represent:

  1. The local time where the article was published;
  2. The time at your location;
  3. The UTC time;
  4. The local time at the location where the article was read and processed (assuming a Web service is processing the articles).

To make matters worse, RSS aggregators often won't tell you which of these interpretations is being used.

If we're going to use simple timestamps that don't contain timezone information, then all times should be reported as UTC. This will become increasingly important as more people use RSS and similar tools to communicate. If everybody would standardize on UTC and explicitly state that times are expressed in UTC, there would be no confusion as to when things were posted.

Even if we standardize on UTC, a simple timestamp doesn't provide enough contextual information. Sometimes you want to know all three relevant times: what time it was for the author when he published the entry (assuming that it's close to the time he wrote it), what time it was for you, and the UTC time. Did the author write his entry in the middle of the day? At the end of a late night hacking session? What were you doing when he wrote the article? The UTC time, of course, is the absolute time used to order the article.

RSS aggregators could supply all three of these times quite easily by storing the offset information (the "-0600" value in the example above) along with the date and time that they normally store. If the timestamp is stored in UTC, then adding the offset will return the local publish time, and converting the UTC time to your time zone (something you can do in any modern programming language) will return the time at your location. That technique also has the benefit of being able to convert to your localtion wherever you are at the moment, something that becomes increasingly important as RSS aggregators begin to appear on mobile devices.

To all publishers of RSS information: please include the offset with the timestamp when you publish your articles.

To all authors of RSS aggregators: please store times in UTC, store the offset, and give me the ability to see all three of the relevant times.