Thursday, 12 August, 2004

Why Current Search Techniques Are Less Than Ideal

I said on Tuesday that blogging would change the way we use the Web, and possibly the way we get our information.  I'd like to explain why I think that's so, but first I need to explain why the current way of searching the Web is less than ideal when looking for current or balanced information on a topic.

One of the primary criteria that Google and other search engines use to rank results is links:  how many pages link to the page being ranked.  The more links, the higher the rank.  It's not quite that simple, of course, because people tripped to the scheme pretty quickly and started doing all kinds of silly things to increase the number of links to their pages.  Blog spam is perhaps the most common such attempt.  Search engines have implemented techniques to minimize the effects of such spam on the search rankings, but like every other spam battle it's an ongoing and ever escalating arms race.

The primary problem with a rating system that places a high value on the number of links to a page is that older content accumulates links and maintains "relevance" even after it becomes stale or out of date.  This is fine for static content like my TriTryst pages and other information that doesn't change much, if at all, over time.  It is not a good way to rank pages that deal with current issues or information in emerging fields of study.

The other problem I see in using a links based rating system for current events is that such a scheme leads to an inadvertent reporting bias.  Searching for current events usually returns a result set that includes stories from the major news outlets (CNN, MSNBC, BBC, New York Times, Reuters, etc.) at the top, followed by links to commentary sites that link to one of those stories.  Stories from the top tier news outlets typically are very similar to each other, meaning that the first few pages of search results will contain essentially homogeneous news and opinion.  Perhaps you'll run across a FreeRepublic or Plastic story that provides commentary.  At best, though, you usually get just two opposing views on the subject, based on wildly different interpretations of the news articles and neither acknowledging that any other interpretation is possible.

Much of this is caused by people who seem unwilling or unable to form their own opinions from reading a news article, or are unwilling to hold an opinion as valid until they read something that backs it up.  And since there typically are only two opposing viewpoints presented in the first few pages of search results, the discussion quickly becomes a this-or-that issue, with no room in the middle.  Truthfully, I can't say which is the cause and which is the effect.  Is news reported this way because that's what people want?  Or does reporting news this way cause public opinion to be bipolar?  That's a question for psychologists and sociologists.  Whatever the case, I think all would benefit from more balanced reporting.

Not that I expect any one news agency to provide impartial or even balanced reporting.  The slant put on news by any organization will reflect the personal views of the writers, editors, publishers, and even the readership.  They are, after all, in the business of selling what they publish, so you can't fault them for publishing what sells.  The idea of unbiased reporting is fine in a world full of totally rational and unemotional beings, but when humans are involved it's better to acknowledge and disclose the bias.  Biased reporting is okay.  Really.  As long as there are many views that are equally accessible.

The "equally accessible" issue is the heart of the problem.  When Microsoft rolled out their MSN Newsbot last month, articles describing it were quick to point out that MSN Newsbot gives preferential treatment to articles that appear on MSNBC.  Google News is a news aggregator that appears to be unbiased, but an article published at Digital Deliverance recently shows that the top five sources of news make up 48% of the headlines on the Google News front page.  The top 100 sources make up 98% of the headlines.  Other news aggregators appear to have similar unintended biases.

One final thing about major search engines:  they tend to attach more relevance to content from known sources.  I'd say that this is a good thing in general, but it tends to push unknown sources to the bottom of the results rankings, even if the information it provides is new and relevant.  Mind you, this isn't criticism, but rather an observation.  It's an artifact of using a ranking scheme designed for static content to search fast changing information.  It also causes most blogs to be pushed down in the rankings because most blogs are not consistently relevant--their content, like mine, varies much more than a news site's.

Those are the problems I see with current search techniques and news aggregators:  freshness, diversity, and visibility.  Tomorrow I'll discuss how using RSS to search blogs and news feeds can improve on that.