Sunday, 15 August, 2004

Combating RSS Spam

People who know a lot more than I do about RSS have already given some thought to RSS spam:  sites that include advertisements as part of their "new items" list.  The simple minded spammers would make the entire feed an advertisement, but those feeds would quickly get black balled.  More subtle approaches are possible and very easy to implement, as pointed out by the linked article.

As annoying as spammers could be to a first generation RSS aggregator like Sharpreader, it's nothing like what they could do to an automated reader bot like the one I described yesterday.  There are many methods that a Web site can use to identify and fool a web crawler, at least temporarily.  It would be impossible for the crawler to identify such a site automatically on the first pass, so spammers are guaranteed that some percentage of people will see their pages.  Combating this problem will require a community-wide effort; one that I hope will be handled better than the email spam effort, what with its over-zealous "black hole" operators.

I doubt that any kind of legislation could be passed to combat RSS spam.  Unlike email spam, which requires spammers to "push" content onto users and thus open themselves to charges of theft of services for bandwidth and storage, RSS spam is entirely a "pull" model:  people (or programs) go to the site and download the spam.  All a Web site does is publish an XML feed and notify a service that the feed has changed.  There's no "push" involved.

I don't know how much of a problem RSS spam could be.  My immediate reaction is that spammers could clog the RSS space as thoroughly as they've clogged email, but upon further reflection I'm not so sure they can.  Granted, they can clog the bandwidth, but I'm not certain that they can get "eyeballs" if the next generation of RSS aggregators implement a filtering scheme similar to the one I've described.  I guess we'll just have to wait a few years to find out.