Saturday, 15 November, 2003

POPFile spam filter

I honestly thought somebody would have fixed the email spam problem by now.  I resisted installing a filter for years, first because my spam problem wasn't all that bad, then because filters just shift the problem, and finally out of sheer pig-headedness:  I hate sub-optimal solutions.  I still have to review every message sender and subject line.  I'd still like a real solution, but the amount of spam I get is nearly unbearable.  On Jeff Duntemann's recommendation, I downloaded and installed POPFile—a trainable Bayesian filter.  It's been 2 weeks now since I first installed the thing, and after about 1500 messages (oddly, my spam count has gone down markedly since mid-October) I'm still teaching it the difference between spam and good mail.  I've added "magnets" that automatically classify important personal and work-related message, but I've resisted adding magnets for everybody in my contact list in the hopes of training the silly thing to tell the difference between jokes from friends and ads for questionable drugs.

Beyond automatically throwing almost everything in the "Junk" box, the filter isn't yet saving me much time.  Just as I feared, I still have to review the filter's output to ensure that it hasn't mis-classified an important message as spam.  I'm hoping that it gets smarter as it gets more experience, but at the moment it's just as much work with the filter as without.  I'm going to give it until the end of the year.  If it's still missing 5% or more after that, then I'll have to re-evaluate the wisdom of using this type of filter.

I'm very disappointed that nothing has yet been done at the protocol level to address the spam problem.  Maybe that's still coming?  I won't hold my breath.  From where I sit, it looks like another 5 to 10 years (if ever!) before the protocols can be changed to prevent  most types of spam.