Forgot your password?

gmuslera's Journal: Spam 4

Journal by gmuslera
I don't receive a big amount of spam (only 20-40 daily), but even in that numbers is annoying.

With the years I changed whay I do to be less worried with it, like putting filters that moves to a temporary folder what don't have specifically my address in the to: or cc: headers, with more or less success (at least when I checked that folder I knew that most will be spam and only tried to determine what would be for me but don't had my address in the headers. Anyway, after I lost some emails I discarded that "solution".

For a time I used Brightmail, but had the problem that I have to download my mails from them, and they acted as a bridge with my ISP mail server. By then my speed (normal modem), my ISP speed (256kbps?) and even my country internet connection speed (few Mbps) makes this process something very slow, effective (few false positives and negatives, but catched most of the spam), but very slow, so when they started to charge for the service I simply don't subscribed to it.

In some moment I decided that this was not enough, learned about SpamCop, and start reporting spam to them, as a try to lower the amount of spammers, maybe not the proffessionals, but at least the beginners that don't understand why is bad. This in fact makes spam to give me more job than before, I have to determine when a message is spam, forward it to spamcop, then follow the link in the spamcop reply and confirm the reports that would be sended, but gived me the feeling that I was doing something about the problem itself, not just minimizing the effects on me. There are tools to automatize a bit spamcop, the first one subscribing to it :), but there is spamcup that could do the spam confirmation part automatically, procmail recipes to forward spam to spamcop, and so on.

But things are not so perfect with Spamcop. First, that spam actually increased since I'm using it. Is not spamcop fault, spam increased everywhere, but gived me the feeling that I was doing some kind of evolutive action on spammers, forcing them to be even nastier than before. Also I noted that, because new (or maybe not so new) ways to send spam, the reports usually don't go to the administrator of the site where the spammer have the email or site (I know about forged headers, I'm not talking about the poor guy that was put in the "From" line), but only to the administrator of the open relay. And third, because is so much reporting without lowering the actual spam I receive.

Spamcop tries to lower the amount of spammers, but what about my mailboxes? There are a lot of ways to filter spam, some more intelligent or effective to others, but most required changes in my mail configuration (I could have used fetchmail+procmail, but as I read also my mail from work a truly automatized solution could give me problems, so I'm still downloading mail with my mail client directly from my ISP mail server). I was very tempted to do the move when I hear about the results of using SpamAssassin and was tempted also by a lot of new approachs to the problem (TMDA, Vipul Razor and more), but the moment I was more close to do the change was when the revolution started with the article "A Plan for Spam" and started to see results and implementations based in this bayesian filtering.

Well, time and job matters also, so I took my time to do the change, and by the time I was to do it I found an approach that will minimize the impact on my current way to use mail: POPFile. It is "simply" a pop3 proxy server with a bayesian mail classification engine and a web based administration, written in Perl. As is runs in my box, I don't have extra delays while the mail travels from my ISP to where the classification is done and then to my computer, and also I just have to change where is my mail server and my username in the mail client configuration and is running. Also have the good surprise that it actually is a bit more tweaked than "normal" bayesian spam filters, so w r i t t i n g w/o/r/d/s with html comments and more modifications that don't put a serious obstacle for reading, but yes for bayesian filtering will be bypassed, and other countermeasures for the latest spammers tricks to avoid this kind of filtering.

I'm actually using it. I defined classes for mailing lists, for virus reports (my ISP have virus filtering, what saved me of maybe Gb of useless mails), for nigeria-like scams, for spamcop reports, and of course, for spam and real mail. Have in my desktop an icon to go to the web interface to change the unusual misclassifications that it do (since my last restart of statistics, it classified 3200 mails, with 75 errors, and almost none of them was a classification as spam other kind of mails... the only cases were when something sended me web pages by mail instead of links).

It could get better, of course, I could do finally the migration to having my own mail server in my computer, or using a fetchmail/procmail scheme to use spamassassin or others to increase a bit the spam detection and elimination. But using popfile at a mailserver level is not possible, as a start, using it in a multiuser way is not very recomended, its administrative web interface, needed for configuration, training and avoid propagating mistakes, well, is very open, you can see all the mails that passed thru it, and if well, you can avoid connection from other computers, there will be a person that could or must see the mail from all others.

I think that I saw out there some bayesian based spam filters that permits, by mail, to make corrections on classifications, but I don't know how easy and comfortable to use would be that kind of things, specially for the initial training. But watching at freshmeat I read something very nice... you can use SpamAssassin to train Bogofilter, and I think that this almost finished to close the circle, the only needed could be some easy way to inform bogofilter that a previously classified spam is not (that I think will be rarely used by the combination of spamassassin with bogofilter), and the "learning by mistakes" part of the bayesian filtering will be done, and this kind of things could be installed at large at ISPs.

Of course, this will not solve the spam problem, spamcop, laws, a new kind of mail system, a more effective way to close open relays, etc, all of this is needed. But if spam delivery becomes massively unreliable a lot of people will think on stopping to do it.

This discussion has been archived. No new comments can be posted.


Comments Filter:
  • But, you do realize that SpamAssassin does Bayesian filtering as well [], yes?
    • I realized, yes, some time ago, but my memory is somewhat fragile and not checked all the SpamAssassin capabilities when I saw the announcement from Bogofilter.

      I learned a while ago that SpamAssassin had a lot of features, and included a lot of techniques inside, like black/white lists, Vipul Razor, and a lot more. But when started to appear tools that use bayesian filtering, in a way I separated the approach of (traditional) spamassassin and the bayesian-only classifiers.

      Anyway, watching the improvemen

  • you might also check out Their free offerings are pretty cool.
    • I checked SneakeMail [] a while ago, that had a free offering similar to the one in spamgourmet, having a disposable email address to be shown in different sites of internet that request a mail address.

      I tried it for a while, then realizes that I actually administer mail servers for several domains (as they are for works where I can or can't be there forever, I still use my old mail address), and where I can create and delete all the mail addresses I want, without the limitations that this kind of services c

Every nonzero finite dimensional inner product space has an orthonormal basis. It makes sense, when you don't think about it.