Forgot your password?
typodupeerror

Proving Which Spam Filters work Best 263

Posted by samzenpus
from the get-rid-of-it dept.
pirateninja writes "Dr. Gord Cormack decided to find and prove what the best spam filter is. In his study he looked at the major spam filters (DSPAM, SpamAssassin, etc.) along with those submitted by various academics. The results are quite surprising, with a previously unheard-of spam filter, which uses ideas from various compression algorithms, performing the best overall. He recently presented the results and methodology used in a presentation titled 'Spam Filters, Do they Work? and Can you prove it?'" Note that this is a video of his presentation.
This discussion has been archived. No new comments can be posted.

Proving Which Spam Filters work Best

Comments Filter:
  • by hyperion454 (766214) on Thursday August 03, 2006 @12:27AM (#15837361)
    At work we've set up a combination of SpamAssassin and Spamhaus. Personally I've went from about 10 spams per day to about 1 every two weeks.
  • Re:Torrents (Score:3, Interesting)

    by Pantero Blanco (792776) on Thursday August 03, 2006 @01:09AM (#15837519)
    I wonder how hard it would be for Slashdot/OSTG to host a tracker for large, article-related files like this. I don't think it would require a lot of funding to run, and it would certainly help with convention presentation videos.
  • GMail Spam Filter (Score:5, Interesting)

    by foxylad (950520) on Thursday August 03, 2006 @02:48AM (#15837790) Homepage

    I use greylisting (gld to be specific) which works wonderfully. A couple of customers wanted even better filtering...

    First I tried DSPAM, but they refused to train it so the results weren't good. Then I tried Spam Assasin, which also let through a suprising amount of spam - a lot more than my personal account on Gmail.

    So I set up accounts on Gmail for them, and forwarded their mail to those accounts (after greylisting - don't want to burden GMail too much!). Gmail lets you set up forwarding, so I simply forwarded all the filtered mail back to a second account on my mailserver for the customer to pick up. Finally I wrote a python script that logs in to Gmail once a week to prevent the account being closed due to non-use.

    A tad involved, but it works like a dream. Yet again Google comes out on top, this time in a market it doesn't even know it's in!

  • So Which One Won? (Score:2, Interesting)

    by ryanisflyboy (202507) on Thursday August 03, 2006 @03:39AM (#15837903) Homepage Journal
    So which one is the "unheard of spam filter?"

    Wouldn't it make sense to put this in the /. submission (or at least a link).

    Did I miss the obvious "and the winner is..." some place?
  • Cloudmark's SpamNet (Score:3, Interesting)

    by cruachan (113813) on Thursday August 03, 2006 @03:40AM (#15837906)
    I have to push this as it usually gets missed from reviews as it's a hybrid P2P solution and not a straightforward filter, but Cloudmark's safetybar product (http://www.cloudmark.com/) is just about perfect for me. I get an average of about 20 spam emails a day and it has a false positive result of 0% and has had for months. In fact I've been using the product for several years now and I think the last time I saw a false positive was a couple of years back.

    On the efficiency side it has a hit rate of nearly 100%. I would have said it was 100% a couple of months back, but just recently it's been having a bit of a problem with one stock-pushing spam.

    Anyway, that aside it's the best spam filter I've ever seen by a very long way, and I'd highly recommend the service. It costs a few $ a month, but it's probably the best value subscription I have.

    I have no connection with the company, just a very satisfied customer who's been using it since the beta some years ago. I have a publically available email address which I've had for years and must be on many spam lists, without Cloudmark it would be unusable, with it it's no problem at all. I recently installed it for my wife who was starting to get a lot of spam - on that I noticed it took about two weeks to get it trained not to junk a few mailing list emails she was on, but after that it's been just as highly reliable as my installation.
  • by patio11 (857072) on Thursday August 03, 2006 @04:15AM (#15837987)
    I ran your message through a perl script to mail it to me for giggles (I do research on spam filtering at ye olde day job). Regretfully, you didn't make it through. Aside from header garbage, which was a mixed bag (half spam tokens, half "known-good automated email" tokens), you ran into problems with dope, ass, wanna, and... work*. Which is just as well, as I have no desire to speak to anyone who uses those words. * Last 15 occurrences in my mailbox are all of the "Make l0ads of $$$ work @ h0m3!" variety.
  • Re:GMail Spam Filter (Score:2, Interesting)

    by sd.fhasldff (833645) on Thursday August 03, 2006 @04:35AM (#15838027)

    This is actually something Google could sell. Access to their mail filter. I do realize that they have "corporate email", but that still smacks a lot of GMail and some businesses would rather avoid that. Instead, they could provide a simple access to their spam filter. Yes, requiring all email to be piped through a Google server if they don't want to make the filter available as a binary (presumably updated regularly).

    To minimize bandwidth consumption and (partly, at least) allay privacy / corporate secrecy worries, the email piped through Google's servers could be limited to anything that didn't pass a white-list filter (e.g. removing all internal corporate email, as well as email from established business partners).

  • by bytesex (112972) on Thursday August 03, 2006 @05:14AM (#15838106) Homepage
    It looks like another win for compression algorithms. Not only do they maximize entropy in your data while shortening it, they can also be used successfully to earmark pieces of text as being written in a certain language, or written by a certain author, and now they can be used for spam detection. The usefullness just keeps on coming. Colour me impressed.
  • by dodobh (65811) on Thursday August 03, 2006 @06:57AM (#15838364) Homepage
    See here [vix.com]

    The key paragraph:

    If you'd like a more topical example, consider "spam". People began altering their e-mail "From:" lines in order to make their addresses harder to guess or aggregate; people began doing pattern matching in order to catch known-bad messages and either sideline or reject them. Many defenders used many small tricks to protect their inboxes. The result has not been that less spam is sent or even that less spam is received, on an aggregate basis. Things are worse now than they've ever been. (I say this as co-founder of MAPS LLC, by which I hope to establish my credentials in the spam field for those of you who do not know me.) Today a small number of highly advanced defenders is spam-immune only because they are a small number and their techniques are not widely effective against the attackers; and a small number of highly advanced attackers can "spam at will" a far larger population than ever before. And the trend is that things are getting worse, and getting worse faster than ever before.
  • Re:Flaw in the test (Score:2, Interesting)

    by shawn.fox (461873) on Thursday August 03, 2006 @07:59AM (#15838543)

    Right now there's some zombie network sending around a stock market scam, of which I am getting roughly 300 copies per hour, even though spamassassin correctly classifies virtually all other unwanted mail.

    Do you happen to use Ameritrade? I started receiving these emails this Sunday myself (July 30). Since I always use disposable email addresses I immediatly noticed that the email was being sent to the disposable address I had created for Ameritrade. I sent them an email complaining about it and accusing them of either giving away my email address to some third party who was spamming me or that someone had stolen customer account information from them. I have yet to hear any response back from them.

  • Re:Harder! (Score:2, Interesting)

    by Morkano (786068) on Thursday August 03, 2006 @11:57AM (#15840203)
    You know, for a university with supposedly the best engineering and CS programs in Canada, their actual use of technology is pretty crazy. You'd think they'd understand it well enough to realize that bit torrent is a great delivery method.

    I remember when I applied to go there, I didn't get the email stating my acceptance until weeks and weeks after I got the physical package. Ha!

The only way to learn a new programming language is by writing programs in it. - Brian Kernighan

Working...