Proving Which Spam Filters work Best 263
pirateninja writes "Dr. Gord Cormack decided to find and prove what the best spam filter is. In his study he looked at the major spam filters (DSPAM, SpamAssassin, etc.) along with those submitted by various academics. The results are quite surprising, with a previously unheard-of spam filter, which uses ideas from various compression algorithms, performing the best overall. He recently presented the results and methodology used in a presentation titled 'Spam Filters, Do they Work? and Can you prove it?'" Note that this is a video of his presentation.
Easier? (Score:3, Insightful)
Re:In my experience... (Score:3, Insightful)
Re:Combo of SpamAssassin and Spamhaus (Score:2, Insightful)
The key is still: don't give out your address. Once you've done that, you're going to be screwed eventually.
RTFA? (Score:5, Insightful)
Not surprising... (Score:4, Insightful)
If they aren't used widely, it would either be because they don't work, or they do work but they haven't caught on [yet].
It's like any other fad. As an example, when the original Survivor series came out, it was really popular because it achieved its goal (attracting viewers) in a way that was original. Heck, even I watched the original one. Now that all the networks are doing the reality TV thing, it has become hackneyed, and each successive version of survivor does a worse job of achieving its goal. And I've given up watching TV.
With antispam, new techniques are effective, but as they become more popular and more widely used, spammers will find equally innovative ways of getting around them.
I've noticed that at any given time, there will be a particular style of (non-blank) spam that manages to get through Gmail's filters fairly consistently, but every now and then Gmail adapts its spam filters to block the successful spam type of the season, and eventually a new type will make its way through.
- RG>
Re:Combo of SpamAssassin and Spamhaus (Score:3, Insightful)
Or, as in my case, you could assume that a university you apply to will not send out a giant mass email to all the incoming graduate students inviting them to the graduate orientation. So now I have the email address of every grad student entering the University of Minnesota this year(and probably a few that aren't) and they have mine. All it takes is one infected box and my previously spam-free gmail account will no longer stay that way. The kicker is that I decided not to go to UMN because they didn't offer me funding...oy!
Re:Flaw in the test (Score:3, Insightful)
Lately, I've been thinking about this problem a lot. The classic method of computer classification systems (Bayes, SVM, whatever) are all based on trying to detect features in a set of objects which separate the objects into two classes. But there is only one feature which is shared by all spam, and which is not shared by mail I wish to receive: all spam is sent by assholes. The problem is, you can't algorithmically detect the asshole coefficient solely from the contents of an SMTP transmission. Therefore I have recently come to the conclusion that we need to revert to a web of trust for accepting email. I have long avoided webs of trust because they seem difficult to manage, but I've come to believe that they are the only way to solve this spam problem.
Re:Only one question... (Score:3, Insightful)
If your mail is that important, you should be using couriers instead of email.
Argh! Gratuitous Video! (Score:2, Insightful)
No bittorrent... No credibility (Score:5, Insightful)
This guy should spend his time educating the fools at his institution.
Re:In my experience... (Score:3, Insightful)
A bittorent policy protest (Score:1, Insightful)
Give grey listing a try... (Score:1, Insightful)
You should also check this article http://www.freesoftwaremagazine.com/articles/focu
It is a war (Score:3, Insightful)
By the time that I have downloaded the video the war will have moved on a couple of iterations ...
Re:In my experience... (Score:3, Insightful)
Well, the spammers have heard of the other methods too and try to subvert them. So give them time and see how it performs if and when it becomes more commonly used and the spammers are trying to beat it.
Re:Harder! (Score:5, Insightful)
Re:In my experience... (Score:5, Insightful)
False positives are a HUGE problem compared to the occasional "true negative"(?).
I'd rather have a small trickle of spam emails (I can't believe I'm saying this, but hear me out) than I would risk missing out on that one truly important email.
Re:Harder! (Score:2, Insightful)
Re:Harder! (Score:2, Insightful)
Yep, you're right. The best long-term information storage media ever invented is poetry.
Re:Why do they try? (Score:2, Insightful)
However, its very often the end user's ISP doing the spam filtering - and this has no direct bearing on the gullibility of the email recipient.
Re:In my experience... (Score:3, Insightful)
First, spam does not need to make sense to make money. Here's some of my latest received headlines:
-----
malware
USDA databases crop
entente cordial: admission relation contract GB giveaway andd
studios another page:
-------
AND IT STILL MAKES MONEY!!!
spam is funded by idiots. we will never run out of idiots on the net. Thus, spam will always be profitible under the current email system. No matter what filters are used. Filters don't fix the spam problem any more than Virus Scanners stop viruses from spreading. It's all reactionary, which translates to 'fighting a never-ending battle on the losing side'.
Re:Harder! (Score:5, Insightful)
Your college doesn't like bandwidth-efficient delivery? Flood them with a Slashdot effect on a 500mb file, an extra $500 in bandwidth charges, and maybe they'll change their tune.
Re:Flaw in the test (Score:3, Insightful)