Thursday, 16 January, 2003

Trainable Bayesian Spam Filters

My friend Jeff Duntemann posted a note yesterday in his web diary about using trainable Bayesian filters to filter spam.  I still don't agree that filtering is the best way to combat spam, but it's probably the best we're going to get, all things considered.  Blocking spam at the source (i.e. preventing it from entering the system in the first place) would be much more effective, but the design of the email protocols, and resistance to change prevent implementation of an effective Internet-wide spam blocking scheme.  So we're left with filtering at the delivery end.

The nice thing about Bayesian filters, as Jeff points out, is that they are trainable.  And the one that everybody's talking about (see Jeff's site for the link) has a 99.5% success rate, with zero false positives.  It's impressive, and perhaps this is the way to go.  But on the client?  Like spam blocking, filtering should be done on the server.  All it would take is some simple modifications to the email server, a few extensions to the POP and IMAP mail protocols, and everybody could have spam filtering regardless of what email client they're using.  Filtering on the server would be much more efficient than having each individual client do the filtering.  Plus, servers could implement black list filtering on a per-user basis, and perhaps stop a large amount of unwanted email from ever being accepted.

Do I expect this to happen?  Sadly, no.  Even as outdated and inefficient as our mail protocols are, I don't expect them to be changed any time soon.  We're left waiting for the established email clients to include this kind of feature, or for somebody to come up with a new email client that has a good interface, includes all of the features we've come to expect, and also has advanced spam blocking features.  I think it's going to be a long wait.