Wednesday, 18 August, 2004

Syndication Format Standards?

Like laws and sausages, you probably don't want to know how RSS works.  Unless you're a developer who's trying to write a program that reads RSS feeds.  Then you're in for a rude awakening.  Silly me, I thought that there was an RSS standard that all had agreed on, and implementing a reader would be relatively simple.  That's not the case at all.

There are three competing syndication formats:  RSS, RDF, and Atom.  The original RSS specification was developed at Netscape and released as version 0.90 in March of 1999.  That was quickly followed by another Netscape release, 0.91, in June 1999.  0.91 was much simplified and was intentionally incompatible with 0.90.  Then things got really ugly, formats diverged to create RDF, and we ended up with nine different RSS/RDF formats, all incompatible with each other to one degree or another.  See The myth of RSS compatibility for all the gory details.

The RSS world split into two camps:  the RSS group who want to freeze RSS development at version 2.0, and the RDF group who continue development of the format and add all kinds of bells and whistles.  The flame wars between these two groups are legendary.  I'll let you search them out if you're so inclined.

And then along came Atom, an attempt to create a syndication format that everybody can agree on.  It looks like the RSS/RDF wars are mostly over and intelligent people are putting their differences behind them to work together on the new format.  Atom is still in the early stages of development, although there are feeds available in that format.  The Atom effort got a big boost in June when the IETF announced formation of the Atom Format and Protocol Working Group.

Another problem that faces developers of syndicated news readers is bad XML.  Many site summaries contain poorly formed XML, which can't be parsed using a standards compliant XML parser.  Repeated messages to the site operators go unacknowledged.  There is an astonishing amount of bad XML out there that nobody will fix.  Developers are forced to either reject feeds that contain poorly formed XML, or attempt to parse it at any cost.  Most do the latter, which leads to "tag soup":  pretty much anything goes and programs try to figure things out.  This is how we ended up with incompatible Web browsers, weird constructs, and strange rendering of HTML.  It makes Web browsers big, clunky, and unreliable.  Standards exist for a reason.  Unfortunately, competitive forces require that software attempt to make sense of bad data.

One can only hope that Atom is approved relatively quickly and that sites using other syndication formats will convert to Atom once it's fully defined.