Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×

Proving Which Spam Filters work Best 263

pirateninja writes "Dr. Gord Cormack decided to find and prove what the best spam filter is. In his study he looked at the major spam filters (DSPAM, SpamAssassin, etc.) along with those submitted by various academics. The results are quite surprising, with a previously unheard-of spam filter, which uses ideas from various compression algorithms, performing the best overall. He recently presented the results and methodology used in a presentation titled 'Spam Filters, Do they Work? and Can you prove it?'" Note that this is a video of his presentation.
This discussion has been archived. No new comments can be posted.

Proving Which Spam Filters work Best

Comments Filter:
  • Easier? (Score:3, Insightful)

    by Ec|ipse ( 52 ) on Thursday August 03, 2006 @12:22AM (#15837338)
    Isn't there an easier way to display the results, liek a chart or something. 400M per file download is a bit extream.
  • by coffeeisclassy ( 991791 ) on Thursday August 03, 2006 @12:29AM (#15837371)
    Whats surprising is, while Bayesian spam filters work well in his tests, the one that performs the best was never really heard of before.... I wonder how long it will be before we see something using the methods available, who wants to bet OpenSource will beet closed source to implementing this?
  • by b0r1s ( 170449 ) on Thursday August 03, 2006 @12:35AM (#15837392) Homepage
    Bah. We use Spamassassin, multiple DNSBLs, and I still get hundreds per day, most of them to addresses published on websites (unavoidable).

    The key is still: don't give out your address. Once you've done that, you're going to be screwed eventually.

  • RTFA? (Score:5, Insightful)

    by glowworm ( 880177 ) on Thursday August 03, 2006 @12:39AM (#15837408) Journal
    So, how are we supposed to RTFA then the FA is over 470MB and a video file. Why not just a nice simple text summary Mr Submitter, but nooooo that would just be too easy!
  • Not surprising... (Score:4, Insightful)

    by RealGrouchy ( 943109 ) on Thursday August 03, 2006 @12:45AM (#15837433)
    Although I haven't WTFV (watched the video), it doesn't seem surprising that spam filters which use techniques that aren't used widely would be most successful.

    If they aren't used widely, it would either be because they don't work, or they do work but they haven't caught on [yet].

    It's like any other fad. As an example, when the original Survivor series came out, it was really popular because it achieved its goal (attracting viewers) in a way that was original. Heck, even I watched the original one. Now that all the networks are doing the reality TV thing, it has become hackneyed, and each successive version of survivor does a worse job of achieving its goal. And I've given up watching TV.

    With antispam, new techniques are effective, but as they become more popular and more widely used, spammers will find equally innovative ways of getting around them.

    I've noticed that at any given time, there will be a particular style of (non-blank) spam that manages to get through Gmail's filters fairly consistently, but every now and then Gmail adapts its spam filters to block the successful spam type of the season, and eventually a new type will make its way through.

    - RG>
  • by antifoidulus ( 807088 ) on Thursday August 03, 2006 @12:54AM (#15837472) Homepage Journal
    Heh, even if you are reasonably diligent in protecting your email address, 9/10 it will still get out(though maybe not as bad). All it takes is one recipient with a compromised windows box and your address can be all over the spammers lists in no time.
    Or, as in my case, you could assume that a university you apply to will not send out a giant mass email to all the incoming graduate students inviting them to the graduate orientation. So now I have the email address of every grad student entering the University of Minnesota this year(and probably a few that aren't) and they have mine. All it takes is one infected box and my previously spam-free gmail account will no longer stay that way. The kicker is that I decided not to go to UMN because they didn't offer me funding...oy!
  • by Jeffrey Baker ( 6191 ) on Thursday August 03, 2006 @01:26AM (#15837564)
    The problem with the spam filters, which you have stated, is that eventually a spammer figures out how to craft a spam which avoids the feature detection systems. Right now there's some zombie network sending around a stock market scam, of which I am getting roughly 300 copies per hour, even though spamassassin correctly classifies virtually all other unwanted mail.

    Lately, I've been thinking about this problem a lot. The classic method of computer classification systems (Bayes, SVM, whatever) are all based on trying to detect features in a set of objects which separate the objects into two classes. But there is only one feature which is shared by all spam, and which is not shared by mail I wish to receive: all spam is sent by assholes. The problem is, you can't algorithmically detect the asshole coefficient solely from the contents of an SMTP transmission. Therefore I have recently come to the conclusion that we need to revert to a web of trust for accepting email. I have long avoided webs of trust because they seem difficult to manage, but I've come to believe that they are the only way to solve this spam problem.
  • by Jeffrey Baker ( 6191 ) on Thursday August 03, 2006 @01:33AM (#15837584)
    There is no classification system with zero real risk, except for delivering all mail to the Inbox. Sorry.

    If your mail is that important, you should be using couriers instead of email.
  • by abh ( 22332 ) <ahockley@gmail.com> on Thursday August 03, 2006 @01:54AM (#15837632) Homepage
    A 400mb video file? Is this a joke? WTF is everyone thinking that everything on the web needs to be on video all of a sudden. I just blogged about this today: http://www.anotherblogger.com/2006/08/02/please-no -more-gratuitous-videoblogging/ [anotherblogger.com]
  • by bgog ( 564818 ) * on Thursday August 03, 2006 @02:33AM (#15837752) Journal
    Why exactly should be give any weight to anything from and organization so ignorant as to disallow bittorrent? I take someone pretty darn ignorant to disallow a protocol because some use it to transport illegal content. Why havn't then banned TCP? It is an evil technology used every day to violate copyright.

    This guy should spend his time educating the fools at his institution.
  • by I!heartU ( 708807 ) on Thursday August 03, 2006 @03:34AM (#15837892)
    Domain keys... now just get everyone to use it.
  • by Anonymous Coward on Thursday August 03, 2006 @03:45AM (#15837919)
    As you wonder how long it will take for 400MB file to come down at 1.5kB/s, a note from TFA:
    We are sorry that these talks are not available through BitTorrent, however under present IST policy we are not allowed to run BitTorrent. We thank you for your understanding.
    Erm.. This is more about a "take this policy and shove it" protest than content of the movie. I applaud their creativity.
  • by xt ( 225814 ) on Thursday August 03, 2006 @03:47AM (#15837928)
    The more effective way I have found to stop spam is grey listing. In the last two months, I have had zero spam messages go through to my mail server. I use GSLT (http://www.xmailserver.org/glst-mod.html [xmailserver.org]), which is mostly for the XMail mail server ( http://www.xmailserver.org/ [xmailserver.org]) but will work anywhere.

    You should also check this article http://www.freesoftwaremagazine.com/articles/focus _spam_postfix?page=0%2C0 [freesoftwaremagazine.com], lots and lots of good advice on spam filtering.
  • It is a war (Score:3, Insightful)

    by Alain Williams ( 2972 ) <addw@phcomp.co.uk> on Thursday August 03, 2006 @04:42AM (#15838039) Homepage
    Spam is a war between the spammers and the system administrators/spam filters. The spam filters adopt a new technique; then spammers then work round it; the spam filters advance; ...

    By the time that I have downloaded the video the war will have moved on a couple of iterations ...

  • by 1u3hr ( 530656 ) on Thursday August 03, 2006 @05:33AM (#15838163)
    Whats surprising is, while Bayesian spam filters work well in his tests, the one that performs the best was never really heard of before

    Well, the spammers have heard of the other methods too and try to subvert them. So give them time and see how it performs if and when it becomes more commonly used and the spammers are trying to beat it.

  • Re:Harder! (Score:5, Insightful)

    by cruachan ( 113813 ) on Thursday August 03, 2006 @05:43AM (#15838190)
    Don't knock it, cuneiform on backed clay is the single most successful format for long-term storage ever invented - 3000 years and counting. Heck, most of our modern storage formats can't even manage 30 - tied to read a 8" floppy recently?
  • by KlaymenDK ( 713149 ) on Thursday August 03, 2006 @05:47AM (#15838197) Journal
    "False positives may be a problem, however."

    False positives are a HUGE problem compared to the occasional "true negative"(?).

    I'd rather have a small trickle of spam emails (I can't believe I'm saying this, but hear me out) than I would risk missing out on that one truly important email.
  • Re:Harder! (Score:2, Insightful)

    by Jartan ( 219704 ) on Thursday August 03, 2006 @06:28AM (#15838294)
    I'm not going to knock it but your statement is very far from the truth. Determining the "most successful" long term storage method invented would require waiting till the year 5xxx something to see if something we've currently invented beats cuneiform. Even then it's pretty hard to prove one way or another since a lot of the cuneiform we have today is being carefuly taken care of to prolong it's lifetime I'd suspect (though I have no confirmation of that part).
  • Re:Harder! (Score:2, Insightful)

    by ozmanjusri ( 601766 ) <aussie_bob@hoMOSCOWtmail.com minus city> on Thursday August 03, 2006 @07:02AM (#15838375) Journal
    I'm not going to knock it but your statement is very far from the truth.

    Yep, you're right. The best long-term information storage media ever invented is poetry.

  • by maubp ( 303462 ) on Thursday August 03, 2006 @07:41AM (#15838484)
    If an end user is trying to block spam, then yes, they are probably not the sort of person likely to buy your product. At least until spam-blocking becomes more main stream in email clients (e.g Mozilla Thunderbird).

    However, its very often the end user's ISP doing the spam filtering - and this has no direct bearing on the gullibility of the email recipient.
  • by jank1887 ( 815982 ) on Thursday August 03, 2006 @08:27AM (#15838664)
    Hello. welcome to the internet.
    First, spam does not need to make sense to make money. Here's some of my latest received headlines:
    • placing LEDhas
    • pJapans mission
    • capture Todays architect shared
    • 6MZ
    and the body text (with an attached image):

    -----
    malware

    USDA databases crop

    entente cordial: admission relation contract GB giveaway andd

    studios another page:

    ... (etc.,etc.)
    -------
    AND IT STILL MAKES MONEY!!!
    spam is funded by idiots. we will never run out of idiots on the net. Thus, spam will always be profitible under the current email system. No matter what filters are used. Filters don't fix the spam problem any more than Virus Scanners stop viruses from spreading. It's all reactionary, which translates to 'fighting a never-ending battle on the losing side'.

  • Re:Harder! (Score:5, Insightful)

    by Squalish ( 542159 ) <Squalish AT hotmail DOT com> on Thursday August 03, 2006 @08:30AM (#15838679) Journal
    Am I the only one that read the means of presentation as a hilarious attack on a university policy of blocking bittorrent? Given that adding 470MB doesn't really add any usable information to a discussion about spam filters over a piece of text, and all.

    Your college doesn't like bandwidth-efficient delivery? Flood them with a Slashdot effect on a 500mb file, an extra $500 in bandwidth charges, and maybe they'll change their tune.
  • by perlchild ( 582235 ) on Thursday August 03, 2006 @10:36AM (#15839583)
    A web of trust will work only until someone you trust's computer gets subverted. The zombie network you mentioned doesn't happen by itself. Now the smaller, more technically proficient web of trust, the less likely it is to be subverted, but it's still vulnerable to someone you trust having their computer hijacked.

UNIX is hot. It's more than hot. It's steaming. It's quicksilver lightning with a laserbeam kicker. -- Michael Jay Tucker

Working...