Fighting Spam with DNA Sequencing Algorithms 142
Christopher Cashell writes "According to this article from NewScientist, IBM's Anti-Spam Filtering Research Project has started testing a new spam filtering algorithm, an algorithm originally designed for DNA sequence analysis. The algorithm has been named Chung-Kwei (after a feng-shui talisman that protects the home against evil spirits). Justin Mason, of SpamAssassin, is quoted as saying that it looks promising. A paper is available on the algorithm, too (PDF)."
hm (Score:0, Interesting)
High tech for what ? (Score:3, Interesting)
Re:Wordfilter (Score:3, Interesting)
personally I'd prefer a much better set of filter tools e.g. being able to say "I only speak English, I NEVER use this account for commerce, and the people I email are professionals so score spelling mistakes much higher as probable spam".
can someone point me in the direction of such a filter?
Re:Mozilla Firefox (Score:2, Interesting)
My experience with it has been rather disapppointing. Why I need to tag as spam two messages from the same sender or with the exact same subject is a mystery to me. After the 10th "Make $/d+ in XX days" type message one has to wonder just how effective this thing is.
This method is promising because it uses spell-checking and a better way to identify spammy string sequences, something none of the two main camps of spam-filters have seem keen to do until now.
The biggest problem I see, at the moment.... (Score:4, Interesting)
Wrong title, I guess (Score:5, Interesting)
I think we will see more and more applications like this with the growing cross-polination between Biology and CS.
Re:Works until the Spammers get a copy of it (Score:2, Interesting)
Spell checker as anti-spam filter - that would create huge problems for most Americans
Otherwise it's a good idea.
Re:Mozilla Firefox (Score:3, Interesting)
This shouldn't be all that surprising - Bayesian filtering is all based on probabilities. The reason "Outlook message rules" is so bad is because a friend of mine might send me a joke about Viagra, which I don't want to have deleted indiscriminately as spam. False positives are infinitely more annoying than false negatives, so I'd much rather have conservative filtering that let a bit of spam through.
I'm not saying Bayseian algorithms are perfect yet (though they'll improve) - my personal experience has been SpamAssassin, which got 97% of spam, and I've been experimenting with Thunderbird for a week, which gets 85%-90% and will no doubt get much much better as I train it in the next couple of weeks - but ultimately Bayesian filtering is enough to beat enough spam to make spamming not worthwhile (if everyone did it...)
They'll.. (Score:3, Interesting)
Uh oh - there goes the patent now.... (Score:2, Interesting)
By now, all the patent-trollster-lurkers who passively phish in the
Can anyone who works in the IP (intellectual property NOT Internet Protocol) post a list of known trollster companies that are full of lawyers who acquire patents (by any means) and make patent litigation their primary business model?
Re:Mozilla Firefox (Score:3, Interesting)
Maintaining an enterprise mail system based upon user-controlled spam filtering software is not practical. That small percentage of users with consistent ID 10T errors adds up fast. Try correcting false positives for a user-configured filter. It's time-consuming.
The better approach from an administrative standpoint is controlling spam at the MTA- and MDA- levels of the mail server. I use postfix checks with MDA-level Bayesian filtering with reasonable success. The spam mbox is comprised of user-submitted and administratively approved mail. The user submits it, and the admin checks for things like filter poisoning text before moving it to the real spam mbox.
Most importantly, my false-positive rate is extremely low -- probably 10's of thousandths of a percent.
Nice tool but greylisting does more right now! (Score:2, Interesting)
Seriously, greylisting implemented on all the ISPs MTAs would overnight block 99% of the spam being sent. Most spam at the moment is being sent from armies of bots run on unsuspecting users systems connected to cable and DSL service. The programs used are unsophisticated, they churn through a list of addresses spewing messages out by the thousands. They do not queue messages or retry them if they get an error. Greylisting uses this to great effect and blocks spam while letting legitimate MTAs deliver messages.
True, it is not 100% effective, some small number of spam messages get through since some spam goes through legitimate MTAs and the message is retried. But once you remove the bulk of spam those can be tracked down and shutdown or blocked at the firewalls.
If the ISPs would implement this spam would become a non-issue over night. Email would once again become a mostly useful tool. But I guess the problem is that the ISPs have no vested interest in solving this problem. None of them will listen or implement this simple solution which does not block any legitimate email. With 70% of the email on the network being spam (number may be higher than that at this time) I would think they would jump at a solution that would reduce the loads on their servers. But I guess they make to much money from spammers to implement such a simple solution.
More correct than you know (Score:2, Interesting)
This is just like your own immune system, which uses such things as "V-D-J" recombination (and other tricks) to create billions of some what random different epitope to attack potential unknown pathogens. Cells they must further educate not to attack "self" in your own body.
If only computer geeks took some lesson from biologist, perhaps they could get a grip on principles to stop SPAM.
Giving birth to Artificial Intelligence... (Score:4, Interesting)
Think about it - we now have software that "learns' what you like. [nuclearelephant.com]
Sorry, but anything that "learns" fits a definition of intelligence - using past results to predict future outcomes. Note that I'm not saying "self aware" or "conscious", simply "intelligence".
As we move forward, we'll see more and more intelligence on the part of the spammers, and the warring factions of intelligence will likely provide massive financial and political impetus to build ever more intelligence solutions - thus AI is born.
The problem with other vehicles for developing AI is simply the budget. With SPAM, everybody has a direct, financial incentive to develop it, so development will definitely happen!
Virus and worm detection! (Score:3, Interesting)
Even moreso, since viruses are much more a compilation of a set of previous constructions with a few mods than a new composition not necessarily based on the wording of old scams.
And Viruses and worms (especially worms) are more constratined by their environment, requiring an exploit of a vulnerability and the instation of work-doing code. Though gene-shuffling techniques might be able to bury much of the code, the basic exploit must continue to be some sort of match to the vulnerability's "receptor".
Re:hm (Score:3, Interesting)
Now that we've neutralized that form of message garbling, we're left to dealing with bayes filter poisoning. This is something that entropy-based filtering deals with quite well.
All spam filtering techniques have weaknesses, but if you use a few different methods in concert, preferably within the same package to spare the poor user from having to set up a whole lot, you can get just about all of it.
Even using a few of these different methods together, I still get a few ads from companies I've done business with that have screwed up my communication preferences. This sucks, but most of these companies are clueless rather than malicious. Threatening to take my business elsewhere has never failed to correct these problems.
Re:Giving birth to Artificial Intelligence... (Score:1, Interesting)