Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
DEAL: For $25 - Add A Second Phone Number To Your Smartphone for life! Use promo code SLASHDOT25. Also, Slashdot's Facebook page has a chat bot now. Message it for stories and more. Check out the new SourceForge HTML5 internet speed test! ×
Spam

A Visual History of Spam 180

Cristiano writes "Microsoft employee Raymond Chen has saved every spam message and virus-laden e-mail he's received at work since 1997 and graphed the spams and viruses to create a cool visual representation of one man's malicious traffic."
This discussion has been archived. No new comments can be posted.

A Visual History of Spam

Comments Filter:
  • by Anonymous Coward on Saturday September 18, 2004 @12:59PM (#10285267)
    "one man's malicious traffic"

    Sounds like a cool title for a future book about Gill Bates.
  • Obvious (Score:2, Funny)

    by Anonymous Coward
    If only MS employees spent more time working on their software, and less time doing these kinds of things...
    • Re:Obvious (Score:5, Insightful)

      by DaHat ( 247651 ) on Saturday September 18, 2004 @01:03PM (#10285304) Homepage
      Do you have a problem with programmers being able to spend a little time here and their on their own projects?
    • RTFB (Score:5, Informative)

      by daytrip00 ( 473461 ) on Saturday September 18, 2004 @02:42PM (#10285876)
      Read the blog. This guy is one prolific programmer. He's the guy who ensures that all the old windows apps (like the ones from 10 years ago) keep running on the latest versions of windows. He has all sorts of stories about windows bugs and idiosyncracies and explains how they all came to be. It's a fascinating read and I have an RSS subscrption to his blog.

      Read this article [joelonsoftware.com] which is all about his quest for windows and developer backwards compatiblity.

      He give this story about Sim City: It deallocated memory, and then used it right after deallocation. It was a bug that windows 95 allowed. So his code make a special check that you were running sim city and if you were, you could use memory right after you deallocated it. It's pretty amazing to see all the hoops that he and his team jump through. But he's a MSFT ledgend.

      PS. That blog entry I linked to sent Shockwaves through Microsoft. It's changed the new XML api design, and resulted in the backporting of Avalon to Windows XP.
    • I guarantee this comment was only made because he is a Microsoft employee.
  • by Fjornir ( 516960 ) on Saturday September 18, 2004 @01:01PM (#10285281)
    ...pretty pictures though, did anyone else try the "magic eye" deal and see what I saw?
  • I'll post a pic if I can find one to show just how geeky you need to look if you want to do this yourself. :P
  • by Anonymous Coward on Saturday September 18, 2004 @01:03PM (#10285302)
    An interesting aside: Raymon Chen is mentioned in the Linux kernel's source 'CREDITS' file:

    N: Raymond Chen

    E: raymondc@microsoft.com

    D: Author of Configure script

    S: 14509 NE 39th Street #1096

    S: Bellevue, Washington 98007

    S: USA

    • Good work (Score:4, Funny)

      by Anonymous Coward on Saturday September 18, 2004 @01:08PM (#10285342)
      Now he'll get even more spam.
    • by sriram_2001 ( 670877 ) on Saturday September 18, 2004 @01:33PM (#10285505)
      This blog (post [dotnetjunkies.com]) has some interesting info on this.

      ...This post wouldn't have been possible without Kaushik - he called me up this morning and said that he had spied a familiar name on the Linux 1.0 contributors file. And since the chances of 2 people with the name Raymond Chen and working at Microsoft were pretty slim, we got pretty interested. A bit of Googling lead us to this page (http://grumbeer.dyndns.org/ftp/mail/v5/digest363) which has an email that Raymond Chen has typed out back in 1993. The first thing that strikes you is his Microsoft id. I was taken aback - a Microsoft employee contributing to Linux code? That too kernel level stuff - not some fringe OSS project? Seems like things were a lot different back then. Here's a snippet from that mail From: raymondc@microsoft.com (Raymond Chen) Subject: New Configure script (and some console patches) Date: 05 Jun 93 20:23:30 GMT This patch kit is really *THREE* patches in one. 1. A new Configure script, hopefully easier to use and more flexible than the current one. 2. A kernel configuration switch to enable high-intensity background in lieu of blinking foreground characters. 3. A kernel configuration switch to control the destination of kernel trace messages (printk's). But the part which I really found interesting was this...the way he signs all his mails. Thanks. -- Raymond (just another linux hacker) Chen Definitely not something you would see nowadays. These days, the very mention of the word 'GPL' might get you into serious trouble in Microsoft - and contributing code is definitely unthinkable.I guess back then , Linux was considered more of a hobbyist-thing rather than a future competitor. But I'm only guessing here. An interesting question that arises is the effects of the viral nature of the GPL. If he had worked on GPL code back then, is he 'infected'? Well - I'm no expert in these issues, but its interesting all the same. Before all the Linux supporters jump to any conspiracy theory, I would just like to point out that the only thing this points out is the amazing versatility and skill exhibited by most Microsoft devs and Raymond in particular. This is a guy who knows both Windows and Linux inside out.Awesome!!! I would really like it if Raymond comes and tells us a bit about his past - especially the 'just another linux hacker' days :-) ....
    • Yes, he used a PERL script to generate the graph. Less than orthodox for an MS employee. His blog's 'not a .net blog' caption also hints at a certain cynicism being harboured.
  • As far as I understand, this is the plot of distribution of *size* of the email vs. time. The "darker" color is not enough of a visual hint to determine the *number* of spam messages over time, which is what is important. Also interesting is the large splotches of computer viruses suggesting (maybe!) that variants are roughly the same size, but not exactly the same.
    • I think the variation in the size of the virus emails probably has to do both with variants (sometimes) and with the amount of trash the put in the body of the message besides the actual payload.
    • I think it depends on what you hate more about spam, the amount of messages that clog your inbox, or the amount of bandwith taken up from downloading useless messages.

      I agree that this chart doesn't visually represent the amount of spam as much as I would like, it would be simpler (and more informative) if it was broken up into two graphs: Total size over time, and number of messages over time.

      Of course the author doesn't intend this to be any kind of serious study. I think he just wanted something
  • by joshuao3 ( 776721 ) on Saturday September 18, 2004 @01:07PM (#10285328) Homepage
    My primary account receives nearly 500 spam messages a day, and the number is growing. It would only take me 6 months to get that amount of spam. It seems like Raymond Chen is less than average in the amount of spam received. The data analysis is intriguing, nonetheless, and I'm glad he had the forsight to do this project.
    • IF the articel he said that this is what gets through the corporate filters.
    • by Holi ( 250190 ) on Saturday September 18, 2004 @01:45PM (#10285563)
      It seems like Raymond Chen is less than average in the amount of spam received

      Umm.. so your the average? Have you ever thought that maybe you are on the high-end of the bell curve.

      Raymond Chen is less then you in the amount of spam received, who knows maybe he is exactly the average.

      Why don't you poll people and find out.

      I would but I dodn't care.
    • I'd say you're probably the abnormal case here. That's a lot.
    • If you've read some of the other comments made on their number of spam messages its hard to take anyone's claim seriously. Some people are reporting getting just one type of virus an average of 1 e-mail every 5 minutes. 288 copies of the same virus in a day? Possible, but doubtful. Others are even claiming a having months where their spams/viruses would reach the 1 gig mark. Who can believe some claims online with numbers like that?
    • 500 per day? You must be one popular fellow ;)

      (As an aside, the article on Raymond's site says that this is the e-mail he receives after it passes through the corporate filters).
  • by FePe ( 720693 ) on Saturday September 18, 2004 @01:09PM (#10285345)
    Single worst spam day by number of messages: August 22, 2002. 67 pieces of spam.
    I normally get around 60 spam mails *per day*, so I guess he is rather lucky. The spam mails I receive are fortunately not full of images like the 41 images he got.
    • On my "spam account", I currently get approximately 200-300 per day. Unfortunately, Yahoo deletes them after a month, and this has thwarted my plan to see how many I could rack up.

      Currently my monthly record is around 7,000.
      • Although we all hate spam, at least we can engage in some harmless macho posturing re the amount of it that we get.

        I'm a mere minnow in comparison to your good self: Just 57 per day, on average.

        Me off to stuff a pair of socks into my pants...
        • Yeah, spams/day seems to be an integral part of the common ePenis.
          My mail-account is online since 1998. I didnt keep it secret, just didnt do stupid things with it (like sign up adult sites or so).
          get 3-7 spams per day. annoying, but thunderbirds only lets 1 or 2 per week slip, so its ignorable.

          The only ways people get 500 per day must be in their own stupidity.
          (btw: this email-address is also in the whois database. IN fact i only started to get spam regularly after i registred my domain. coincidence?)
          • The only ways people get 500 per day must be in their own stupidity.

            I probably get 500 spams a day, but I don't think it's because I'm stupid.

            I have an email address (MyFullName@MyCompanyName.com) that I've been using for well over a decade for a personal business. I don't plan to change either my name or my company name.

            When I would be a speaker at some event or teach a seminar, the organizers would always include my email address as part of the speaker bio, which started going up on the Web when the W
    • According to the stats my spam filter keeps, I receive an average of 92 messages a day, 71 per cent (or about 65) of which are spam. I'm rather surprised I receive so "few" considering my email address is listed on about 6,000 pages on the web.

      I thank God for Bayesian filtering every day, I usually only see 1 or two spam every few days.
  • by superbondbond ( 718459 ) on Saturday September 18, 2004 @01:09PM (#10285347)
    I think if I were to actually see what went into Spam I'd never be able to eat it again.
  • by Anonymous Coward on Saturday September 18, 2004 @01:09PM (#10285349)
    Jose Nazario arguably has a much more extensive collection of spam, you can see some of his research here: http://www.monkey.org/~jose/wiki/wiki.php?page=Spa mAnalysis.

    One of several talks of his on spam (complete with more graphs): http://www.linuxchile.cl/docs.php?op=verVersion&do c=64&id=1 [linuxchile.cl] And he's even done generated some really really horribly insane spam collages, but I'll let those interested dig around for them on their own.

    • They're holding talks in IRC now? (The document AC linked to is an IRC log.) Cool. I never would have thought of that, but I guess why not? Is this commonly done? I'd like to have something to read. :)

      Sorry for posting off-topic, but it's a slow news days, anyway - none of the stories today has gotten more than 250 comments.
  • by Hockney Twang ( 769594 ) on Saturday September 18, 2004 @01:11PM (#10285370)
    I would have much preferred to see the volume of email, represented in terms of the size of messages received, displayed on a nice looking bar graph, with viruses in the foreground, spam in the back. Maybe even show legit email as another row in front of the viruses. Or even just a line graph. As it is, the information is occluded by his presentation. He took some raw data, did very little to interpet it, and put it on his blog. The information could be interesting, but the presentation is very lacking.
  • by Shayde ( 189538 ) on Saturday September 18, 2004 @01:12PM (#10285371) Homepage
    Single worst spam day by number of messages: August 22, 2002. 67 pieces of spam. The vertical blue line.

    This guy needs to get out more. I set up monitoring of all my spam and total message traffic for the last couple years. My current average is around 350-450 spams per day. Check out the spam report I run every night [homeport.org].

    Virii? That's a different report. I seperate my virii out of the entire mail feed for the 3-4 domains I run (yay amavisd and postfix). The virii report [homeport.org] is a lot more variable, with as many as 1600 viruses a day, and as few as 10, though that's pretty rare.

    Spam filtering here is done via amavisd + postfix + spamassassin + some custom rules.
    • 67 that made it past the corporate filters. I ahve to admit that makes it sound like MS has pretty good filters though.

      OTOH, he could just be a man with low span suseptibility :)

    • This guy needs to get out more. I set up monitoring of all my spam and total message traffic for the last couple years. My current average is around 350-450 spams per day.

      It looks like you need to get out more!

  • Man, this guy really doesn't get much spam at all. Before I threw SpamAssassin on my mail server, I was getting close to 1,000 spams a day on my personal e-mail address at its height. I saved my spam from 2001-2004, and I had over 250,000 messages for the whole period; the volume totals around 1.3GB. So dude's totals are small, if you ask me. ;-P
    • by Anonymous Coward
      From the page:

      Note that this chart is not scientific. Only mail which makes it past the corporate spam and virus filters show up on the chart.

      *DOH*
  • How I avoid spam. (Score:3, Interesting)

    by here4fun ( 813136 ) on Saturday September 18, 2004 @01:16PM (#10285404) Homepage Journal
    Here is what I did and I get next to no spam. Actually, I have none. I got an account at yahoo, and I made a login which has numbers mixed in, and is not a word from a dictionary. Think taking the first three lettes of your first name, a couple numbers, the first four lettes of your last name, and a couple more numbers. I never post my email address anywhere on the web, and just use it to communicate with people I know. I have a second email address that I give out to everyone, and that one is not bad with spam either. The account that gets 100 spam messages a day is my account that I used to reply to offers from websites, or that I used when posting on the web. It is a shame, because I don't check that last account except once every other month when I have nothing_better_to_do. And every once or twice a year I get an email which is important.

    When I was back in school I never had spam in my university account, but that was before the 2002 spike shown on his graph. I wonder if school email accounts are still off limits. When I was in school, I did not get spam there, it was my "free" email accounts that had spam.

    • Mailinator (Score:2, Informative)

      by R.Mo_Robert ( 737913 )

      Have you head of Mailinator [mailinator.com]?

      Basically, you can make up any e-mail address, say foobar2004@mailinator.com and go and check it later. All you have to do is type in your chosen name and check for mail. It's useful for websites you don't really trust (but not for those you might continually receive useful mail from). And, of course, it's incredibly unsuitable for any personal information, since anyone can check any "account" if they can guess its name. And e-mails only stay for a certain number of hours/days.

    • I wonder if school email accounts are still off limits.

      More likely your school has a kick-ass spam filter or something like that. My school account got hundreds of spams a day, and my classmates seemed to think that was about average.

  • Man, i couldnt if i wanted too.. i get 10mb a day of the crap..
    • So 3 years of the crap would be a little over 10 gb of space. Not a huge amount for today's hard drives. Certainly won't be a huge amount for your hard drive 3 years from now. And this is uncompressed. Its definitely feasibly... whether you'd actually want to is a whole different story
  • There seems to be a disproportionate amount of spam in late 1997 (as compared to the following few years) . . . anyone know why this might be?
  • Hey, that graph's some important news (it made it on Slashdot!)

    I think we should all email it out to everyone we know.

  • Here's [nyud.net] the Coral cache page.
  • I would like to see the OS graph of machines sending spam/virus 1998 -> / 2004 -> |
  • by DuctTape ( 101304 ) on Saturday September 18, 2004 @01:43PM (#10285556)
    Cristiano writes "Microsoft employee Raymond Chen has saved every spam message and virus-laden e-mail he's received at work since 1997 and graphed the spams and viruses to create a cool visual representation of one man's malicious traffic."

    I'd like to have saved every BSOD that I've received since 1997 and make a cool visual representation, too, but the system crashes each time I get one... so much for data retention.

    DT

  • by Prince Vegeta SSJ4 ( 718736 ) on Saturday September 18, 2004 @01:45PM (#10285565)
    THIS [cert.dfn.de] site even has an animation of the propagation of spam.
  • Irony (Score:4, Funny)

    by thrills33ker ( 740062 ) on Saturday September 18, 2004 @01:51PM (#10285592) Homepage
    A Microsoft employee keeps a record of his ever-increasing levels of spam and viruses?

    Aargh! My irony meter has gone off the scale!!
  • It boggles the mind to think about how much bandwidth is wasted on the useless trash that spam is. Not to mention just time spent with dealing with that. How much money is lost each year overall due to spam... the number must be huge. This is an unnecessary loss of money and time.

    I think this problem will just escalate for as long as we have SMTP in use. So maybe SMTP as a protocol needs a rehaul, or a revision to rewrite it completely (and call it something different). I think it wouldn't be impossible to

    • Excellent idea. You go first.
      • Sure. In the meanwhile, why don't you have a look at how X.400 mail was done, for some perspective. At the protocol level, SMTP works but only if everyone plays nice, I'm sorry to say. The protocol state machine is also too complex, it could be much simpler: 1. here's the recipient, 2. here's the mail. The server could disconnect the sender in either 1 or 2. Sender and other stuff is matter of the message representation (if you need signatures to prove the identity, or what ever).

        HELO/EHLO is a hack in SMT

    • I think this problem will just escalate for as long as we have SMTP in use. So maybe SMTP as a protocol needs a rehaul, or a revision to rewrite it completely (and call it something different). I think it wouldn't be impossible to pull off.

      Waste of time.

      Every month someone suggests that there's a technological solution to this problem. But there isn't. This isn't a tech problem. It's a law-enforcement/sociological problem.

      You can only go so far technologically as long as spammers are allowed to compr
      • You can only go so far technologically as long as spammers are allowed to compromise peoples' computers and use them for improper activities.

        So, rewrite the mail system in such a way that each mail sent requires the sender's computer to crack a small computational puzzle, which takes e.g. 10 seconds. That's a technological solution. It restricts you so that you can only send 6 mails per minute. For normal use, this is more than enough: in 10 minutes you can send 60 mails. However, you cannot achieve thro

        • So, rewrite the mail system in such a way that each mail sent requires the sender's computer to crack a small computational puzzle, which takes e.g. 10 seconds. That's a technological solution. It restricts you so that you can only send 6 mails per minute. For normal use, this is more than enough: in 10 minutes you can send 60 mails. However, you cannot achieve throughput in the rate of many tens of mails per second. Rate of spamming is thereby reduced.

          How is this any form of improvement? Penalize everyo
          • How is this any form of improvement? Penalize everyone on the planet because of spammers? Force an entire worldwide network systems upgrade? Slow down mail service exponentially?

            How many times do you send more than 100 mails per day? How many times do you send more than 5 mails per minute? A normal user doesn't. And those who legitimately do, are so few that a new kind of system could be worked out for them.

            Make it impossible to send large numbers of mail. That's a solution which works. Systems upgrade,

            • Mail servers should be "licensed" to operate on the Internet

              This doesn't work. Think zombie machines in some ISP's network.. Windoze machines which the ISP considers trusted, most likely, since it's their customers we're talking about. The mail server is licensed, all right, but the zombie client can pump out a million messages through that licensed server.


              It does work. Like you said earlier, smart relays should trigger an alarm if any single client starts to send out too much mail, but that should be
              • If the ISP can't control their internal clients, then they deserve to lose their SMTP license.

                How the hell do you expect some ISP to control what's being run and downloaded in some Windows box of a home user who has no clue of security? It's impossible. The ISPs can't even keep each Windows box in their network up-to-date with security patches! So it's just not going to happen. The ISP can shut the box down, but that is after the damage has been done.

                You obviously don't have much experience in this are

                • I would like to ask you, how did you feel when Telnet was replaced with SSH? That required phasing out a (security-wise) broken protocol with something that works a whole lot better. You could have insisted on policies which say "thou shalt not eavesdrop" but that clearly doesn't work. A better solution is to just make it technically/computationally impossible (or at least as hard as possible) to eavesdrop.

                  Telnet was not replaced with SSH. That's an invalid analogy. SSH was an *alternative* to Telnet th
                  • Take a deep breath and this time please READ at least the following three paragraphs before answering, since what you answered to was definitely not the reasons why I consider SMTP to be obsolete.

                    1. By redefining the protocol I want the protocol to be simpler, and utilize a hashcash-like system in its very core. (If you don't know what hashcash is, Google it up now or read paragraph 2)

                    2. By using a hashcash-like computational puzzles, it just is PHYSICALLY IMPOSSIBLE for anyone to send large numbers of ma

                    • By using a hashcash-like computational puzzles, it just is PHYSICALLY IMPOSSIBLE for anyone to send large numbers of mails. (The machine which is to receive the message gives a computational puzzle for the client to solve, and will accept the message only after the result has been verified. This can be done in such a way that it WILL require e.g. 7 seconds of calculation per mail. And it cannot be bypassed. For instance the client has to bruteforce a collision for an n-bit hash function, given m bits, where
                    • That's like creating a web site that can only handle one visitor every seven seconds. What's the point? Why even bother?

                      If you have a web server running on a Gameboy with 10 bits per second bandwidth, you might want to do this. In other words, if the bandwidth resource or server resource is very scarce, you might want to limit the usage.

                      Also, I don't see how the mail service would be slowed down beyond usage. People poll their mail (POP/IMAP/web interface/whatever) with intervals being in minutes, so as

                    • No, no, no, no! Less spam, yes. But the legitimate mail would not be affected that much. Why? BECAUSE a normal person does NOT send 100s of mails per hour. They send maybe 2 or 3 (amortized over the duration of the day). For them the new kind of system wouldn't make any differences.

                      Yes, a normal person doesn't send 100s of mail an hour, BUT a normal mail server DOES handle 100s of legitimate mail an hour. And if you impose a stupid computational hash routine on each connection, you cut down on the effici
                    • Yes, a normal person doesn't send 100s of mail an hour, BUT a normal mail server DOES handle 100s of legitimate mail an hour. And if you impose a stupid computational hash routine on each connection, you cut down on the efficiency of the mail server exponentially.

                      Yes, relaying would be problematic, and that's why the protocol itself would have to undergo some changes. Or the "exchangeable hashcash" could be utilized as a Proof-of-Work in the receiving end (still can't remember the author!!).

                      But you did

                    • But you did realize that all the calculation is done on the client end? Not in the server end. The server does not bruteforce anything. The server just makes a large random number r, runs MD5 (or what ever one-way hash function) to get H, sends H to client along with some bits of the r and tells the client to figure out the original r.

                      Whoever does the cpu work is irrelevant. It would be reasonable to assume that during the process of the client calculating the proper response, a socket would remain open
                    • Whoever does the cpu work is irrelevant. It would be reasonable to assume that during the process of the client calculating the proper response, a socket would remain open to the server, thus wasting precious resources.

                      No, it is not quite irrelevant. The client has to do it - it has to be done by the one who is sending. It would also not be reasonable to keep the socket open due to the DoS possibility, as you mentioned.

                      And in the stateless environment of the Internet, how does the server allow the clien

  • Probably a lot. I remember a Raymond Chen who went by the name of BustrBuny on IRC something like eight years ago ...

    Quite the troublemaker he was, but he was fun too :)

  • by Anonymous Coward
    Does that mean that Bill Gates will be sending me the money he owes me for forwarding all those emails?
  • by Basehart ( 633304 ) on Saturday September 18, 2004 @03:44PM (#10286253)
    2002 must be the year when Florida got connected to the internet.


  • I want to know why this guy has only received 3,500 spams since 1997?

    1-800-WAA-AAAH!

    Cheez!

  • About 6 months ago, I decided to disprove the claim some people were making about spam increasing exponentially. So I started on a project of plotting my personal spam over the past few years. I was rather disturbed to discover the exponential fit was better than the quadratic fit. Since then, it's tapered off, but you might still check out the plot [uiuc.edu]. Also, I started plotting spam and viruses system-wide. Lots more plots [uiuc.edu] are available (though only for a few months history, rather than years).
  • by sakeneko ( 447402 ) on Saturday September 18, 2004 @07:07PM (#10287424) Homepage Journal

    I think it was before 2000 that I last had that few spams in a day. <wry grin> That's what happens when you have an old email address and like to post to Usenet....

  • >>"...saved every spam message and virus-laden e-mail he's received at work since 1997."

    O-o-kay. Step away from the keyboard.

  • roughly 19,000 messages [...] 3500 messages

    Since 1997?

    I've gotten 16000 spams and viruses since *APRIL*. That doesn't count the accounts I've cut off because I was getting nothing but spam.
  • I think the graph isn't too helpful. Size vs time may be interesting to look at but it doesn't really say much. I think a more useful plot would be a frequency chart or a histogram or something like that.

    I'm not dissing the work--just saying how it could have been better...
  • It's down at the moment (too many people tried to download the whole thing for their Baysian filters or whatever), but I've collected all my spam since Aug 1997 here [annexia.org].

    Internet Archive version [archive.org].

    Rich.

A bug in the code is worth two in the documentation.

Working...