Sunday July 4, 2004

Spamicide

After trying various ways to cull out email spam, I recently hit on a very simple and effective method of detecting it. I haven't seen this method mentioned anywhere before, but I can't be the first person to think of it, so I'm not going to claim to have invented it. It's a little too aggressive, marking some legitimate email as spam, but since I've been using it (several months), it has successfully marked every piece of spam as spam.

To understand how this method works, you need to know a bit about email address formats. Internet email addresses, as defined in RFC-822, can take one of two forms. The simpler one is the familiar name@network-address form. The more complicated one also includes a more readable version of the address, usually the addressee's name. It looks like this:

"Some Name" <somename@someinternetsite>

Any email program worthy of the name uses this format automatically if you've got a person's email address along with their real name in your address book—for example, Outlook and Outlook Express replace simple addresses with this form when you hit Ctrl+K.

This fact can be used to separate spam from legitimate email. If a human sends me email, the To: line will almost certainly contain both my email address and my real name. If a spammer sends me mail, he just got my email address from a huge list of addresses culled from the web or elsewhere, and not my real name. So, I made a rule for incoming messages that says, "If the To: line doesn't have my full name in it, move the message to the spam folder".

This catches all the spam, but it also catches legitimate email sent to my plain address (i.e. without my real name attached). This includes most or all of the email I get when I buy something online, since web retailers usually only have the simple version. It also catches email from people who don't have me in their address book. So, after I added this rule, occasional legitimate email was going in the spam folder. To fix this, I added another rule before the first rule that contains a whitelist: if the email comes from one of the addresses on the whitelist, it's not put in the spam folder. I update this list as necessary, which isn't very often any more.

I still have to look at all the messages in my spam folder, but only to quickly scan them for familiar names and places I've ordered something from. Of course, this is still takes more time than I'd like to spend dealing with spam—zero minutes per day—so I'm still looking forward to some comprehensive technical fix to the spam problem. And also to a cure for the common cold, fusion power generation, and commercial flights to the moon.

I am The Tensor, and I approve this post.
04:04 AM in Web/Tech | Submit: | Links:

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341c88ad53ef00d83464853769e2

Listed below are links to weblogs that reference Spamicide:

Comments

Why not do that, and have a whitelist too?

Posted by: Matt at Jul 9, 2004 10:33:32 AM