Asynchronous Programming for Spam Elimination 63

Posted by timothy on Thursday October 12, 2006 @10:03PM from the you-do-this-while-they-do-that dept.

ttul writes "Stas Bekman (formerly the maintainer of mod_perl) has been quietly building an asynchronous programming framework to build high performance network applications in Perl. His recent Perl.com article describes how he has used the Event::Lib module (that lives on top of the popular libevent library) to write a traffic-shaping email proxy to get rid of spam. Asynchronous programming is challenging at the best of times. Read on to find out how to do it the easy way in Perl."

Asynchronous Programming for Spam Elimination

This discussion has been archived. No new comments can be posted.

Search 63 Comments Log In/Create an Account

Comments Filter:

AJaX (Score:2, Informative)

by tepples ( 727027 ) writes: <tepples.gmail@com> on Thursday October 12, 2006 @10:58PM (#16418255) Homepage Journal

Asynchronous Programming = programming with futures

Except "asynchronous programming" is already a well-known term among many web developers:

Asynchronous Programming with
JavaScript, HTML DOM,
and
XMLHttpRequest

Yes, it does in fact work (Score:5, Informative)

by ttul ( 193303 ) * writes: on Friday October 13, 2006 @02:57AM (#16419857) Homepage

[full disclosure and shillery alert: I work with Stas at MailChannels]

You make some very good points -- and these are all concerns we had when we set out to build this software.
Fortunately for the world, these concerns have turned out to be unwarranted. Furthermore, our experience in actually deploying this technology has been far more breathtaking than we had imagined -- both in terms of spam mitigation and improvements in scalability.

> the core assumption, and the only thing that makes this work, is that botnet spam software will _always_ just
> give up after 30 seconds;

I have a theory that spammers will always be impatient. I believe this theory for several reasons:

1. Spam campaigns are now recognized by anti-spam companies in minutes or hours. New campaigns therefore have a very short life expectancy and have to be completed as fast as possible. If mail can't get delivered fast, it's time to move on to a new domain to get it moving again. With collaborative filters like Cloudmark recognizing campaigns in less than 60 seconds, spammers obviously have to move traffic fast.

2. Botnets are not unlimited in their size or bandwidth capacity. Typicaly botnets these days are between 1,000 and 10,000 hosts. Any larger and the command and control channels are very quickly noticed and shut down by service providers. Botnets cost money too -- $250/hour for a 10K botnet is typical.

3. Spammers raison d'etre is to send lots of mail and hope that a small percentage of recipients buy something. The only way to make the business profitable is to send huge amounts of mail. If all zombie traffic in the world was magically being slowed down, spamming would no longer be profitable and spammers would tend to focus more on things like highly targeted phishing instead. Not surprisingly, we're already starting to see this.

4. Because #3 isn't going to happen any time soon, and in light of the technical constraints (1 and 2), spammers have no choice but to abort their connections within a very short time frame. It's just the nature of the economic beast. Hanging on is just for posterity. It doesn't make economic sense.

5. It works. And it's very very scalable. By slowing down traffic and multiplexing what remains, mail server load drops by 90%. In big installations, that means no more being paged in the middle of the night because your cluster of 4-way Xeons with 8GB of RAM is borked by a distributed spam burst.

Oh -- and of course you can't just slow everything down. It's important to be very selective so as not to delay everything.

> if this throttling technique ever became commonplace, spammers would just write their
> own asynchronous mailer -- it's not THAT hard...

Actually, it is that hard. Even Stas got a headache working on this project.

But even if it was easy, it would be pointless for a spammer to launch more than one connection per zombie. If a sender is marked as suspicious, the sender's concurrency is severely limited. One connection per zombie, at 5 bytes per second -- that's just not economic.

> furthermore, i bet there are some shitty legitimate MTAs that would just give up too, causing actual
> mail to get discarded :)

Let's just say the gap between the patience of spammers and the patience of legitimate MTAs is very large indeed. And by carefully fingerprinting and assessing sender reputation, this problem can be minimized to the point where it is a far smaller problem than content filter false positives.

I also want to point out that this technology does not make email suck by slowing it down. It in fact speeds up delivery of legitimate mail in most cases because the load is so reduced on the rest of the infrastructure.

Just talk to our customers. One of them was running four 4-way Xeon boxes with 8GB of RAM each -- all this to service the spam filtering needs of just 10,000 end users. He told us he hadn't slept a full night in months because of load-based outages. Since installing the software Stas built, the only alert he's received is a notification that the load level dropped below the panic threshold!
Read the rest of this comment...

Re:As if PERL wasn't hard enough to read... (Score:2, Informative)

by ttul ( 193303 ) * writes: on Friday October 13, 2006 @03:20AM (#16419961) Homepage

[full disclosure: I work with Stas at MailChannels]

We looked at using the new Perl threads, but Perl 5.8 threads suffer from a few severe limitations.

1. When you create a new thread, a complete copy of the interpreter is made. The new thread makes use of this new interpreter instance and cannot communicate with the original thread except via the threads::shared module or some traditional IPC mechanism. In short, they're no better than forking a new process and in many ways, they are far worse than this.

2. Perl threads are still quite unstable.

Yes, we could have used Python. Or Ruby. Both these languages have better threading support by leaps and bounds. Additionally, they have great asynchronous libraries like Twisted. Why'd we use Perl? Well, I suppose it's in our blood. Between Stas and the rest of the dev team, we have a good cross-section of Perl talent.

Re:As if PERL wasn't hard enough to read... (Score:4, Informative)

by kimanaw ( 795600 ) writes: on Friday October 13, 2006 @10:28AM (#16423043)

Yes, we could have used Python. Or Ruby. Both these languages have better threading support by leaps and bounds.

Er, how ? Because they don't really use threads ? Sure, they're fast and lightweight...but since they don't use the underlying OS's threads implementation (ie, kernel-compatible threads), they're only marginally useful on multiCPU and/or multicore systems.

2. Perl threads are still quite unstable.

Whats your basis for that statement ? Have you tested the latest versions of the threads [cpan.org] and threads::shared [cpan.org] modules ? Some significant effort has been applied in the past year to improve stability, as well as reduce footprint...you might want to give it a look...
Perhaps if your org can get some funding, you might throw some money at the TPF to get iCOW implemented ? Which should vastly improve thread startup and reduce footprint. threads::shared remains a bit of a challenge, but that issue can be addressed by some carefully crafted XS (which I'm told Stas is pretty good at ;^).

Re:Yes, it does in fact work (Score:4, Informative)

by caseih ( 160668 ) writes: on Friday October 13, 2006 @12:42PM (#16425175)

Unfortunately I've seen a marked decrease in the effectiveness of grey-listing lately, which is similar in intent to your ideas. What I'm finding is that a lot of spam is now coming from RFC-compliant mail servers. Stock spams in particular always come through after faithfully waiting out the greylist timeout. So obviously some spammers are able to wait, even up to 45 minutes, to send their spam to me. So despite your arguments spammers will find a way to still economically spam while tolerating delays, holding connections open, etc.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Asynchronous Programming for Spam Elimination 63

Asynchronous Programming for Spam Elimination More Login

Asynchronous Programming for Spam Elimination

AJaX (Score:2, Informative)

Yes, it does in fact work (Score:5, Informative)

Re:As if PERL wasn't hard enough to read... (Score:2, Informative)

Re:As if PERL wasn't hard enough to read... (Score:4, Informative)

Re:Yes, it does in fact work (Score:4, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot