Best Method For Foiling Email Harvesters? 506
pjp6259 writes "One of the common ways that spammers generate email mailing lists is by harvesting email addressess from websites. But in many cases you also need to make it easy for your customers to reach you. I have found three common solutions to this problem: 1.) Use an image to replace your email address. 2.) Use ascii encodings for some/all of the characters. 3.) Use javascript to concatenate and/or obfuscate your email address. Which of these methods are most effective? Are email harvesters able to interpret javascript? What do you use?"
Make people think to figure out your e-mail (Score:3, Interesting)
- Putting the e-mail in a distorted picture (like a captcha) - this is very difficult for spam crawlers to read
- Using a long human readable message "tset ta tset tod moc.reverse.each.word.prior.to.first.dot.for.addr
In general, your best defense is to employ some method that requires human interpretation.
Form (Score:5, Interesting)
If people need to send you files, they can do so after you reply back to them.
Image (Score:2, Interesting)
As for whether the harvesters can interpret javascript, I think that it depends on the particular harvester. You could analyze the source or the created page.
disallow Windows users (Score:4, Interesting)
I have one email that I use specifically for REPLYING to emails and that one is the one that gets the MOST Spam.
use a Table! (Score:4, Interesting)
Re:Form (Score:3, Interesting)
All it takes is one of the dickwads to manually figure out your form and then they all do it. In addition to whatever you have as your form, make certain you disallow HTML in any of the fields or they will own you.
I have one set to show that it all went through just fine but it really just ignores their entry. It has worked so far.
There is a simpler ingenius method. (Score:3, Interesting)
Decoy address to build a spammer blacklist (Score:5, Interesting)
Put 2 email addresses on your web site, the real one, and a 'decoy' one which is hidden from normal users (eg white-on-white text right at the bottom of the screen).
Any email that arrives at the 'decoy' address is parsed, and the sender added to a blacklist.
Just be unique (Score:3, Interesting)
That being said, I don't think spammers crawl the net looking for addresses so much. Their zombies have all the addresses they need. Just try to give out your email address to people that don't have an affinity for virus infections. In my case, I protect my customers so my address hasn't been abuse too heavily thus far.
Fuck 'em! (Score:5, Interesting)
Yes, I get quite a lot of spam. But with the usual techniques (greylisting, SpamAssassin, etc.) I only actually receive maybe half a dozen spam e-mails a day. And more importantly, all my actually valid e-mail still seems to get through just fine. I'm happy with it, and I get the personal satisfaction of being able to use my e-mail address wherever I damn well like without having to cower from spammers.
Re:Personally I go for (Score:3, Interesting)
Re:use a Table! (Score:5, Interesting)
Couldn't you equivalently do <span>jsmith</span>@<span>example.com</span> ? You still lose the mailto though..
(I suppose you could toss in <span style="display: none">fnarfnarfnar</span> or something as well, if you want to confuse matters slightly more)
Would copy/paste insert whitespace anywhere where you don't want it?
Re:SpamGourmet.com (Score:3, Interesting)
This spring I was shopping for a new SUV, interested in an Escape. I went to ford's web site and they had a "submit email address to have dealers in your area contact you". Sure that's easy enough. But I'm paranoid. Yes it's Ford but still. So I made "v1ford" forward to my main email address. I got five replies from dealers in my area and forgot about the whole thing.
SIX MONTHS LATER I started receiving spam, one per day, to v1ford. Bastards. And they waited half a year before sellign me out, thinking I would not know! So that alias which I had forgotten to delete after I got my replies, I just deleted and they "went away". It astounds me that someone that I am about to buy a $26k product from is doing things to piss me off.
Tho to be fair it was probably one of the five that replied to me, that got his PC owned by a spam virus. But still, that's not responsibly protecting the privacy of your (potential) customers. Just goes to show, you really can't trust ANYONE with your real address nowadays - even if they are reputable and have integrity, you can't count on them ALL being bright bulbs, and it only takes one to ruin it for you.
Using this system I have only received spam on a few occasions, one of which was when a large company I trusted posted my email address on their web site. (d'oh!)
Re:Make people think to figure out your e-mail (Score:5, Interesting)
I predict Technical solutions will continue to fail to solve the spam problem, because it is not primarily a technical problem. It is a moral problem. Spammers (whoever they might be) are not respecting people. They are disrespecting us in order to get some money. Their values put dollars above the needs of anonymized people.
Until the moral problem can be solved adequately through accountability or other means, we are stuck with technical "solutions". Hopefully the solutions keep in mind the original intent of the technology or else we will continue to spend our time "jumping through hoops" rather than actually accomplishing work.
While a captcha does require human intervention, it makes it more difficult for a "normal" user to access. Same with nameIhatespam@domain.com or nameih8spam@domain.com or name @ domain.com This requires manual work and appears "unprofessional" Such confusion creates a barrier to effective communication.
Sure if you are on the "hackers are us" website such tricks are fine, 100% geeks, all interested in spending time re-typing information.
However if your audience is not technical, has any kind of failing eyesight (many over 60), or limited patience (the entire web audience) you had better keep it transparent for the end user. This is where javascript has served us well.
In recently gathering information from hundreds of manufacturing websites, I've found that the "cuter" the tricks, the less likely I am to pursue a working relationship with that manufacturer.
There are still tons of websites out there with unobscured email addresses in the HTML code and even in the text of the webpages. I don't see why spam harvesters would need to bother with javascript parsing engines when there is such a rich harvest of real email addresses out there.
I think people who are wiser than me need to consider how a community approach could seriously hamper spam. Maybe it is shaming the companies that build spam harvesting software. (we have imagination, we could 'make' them stop) I know that phoning and talking crossly to the wife of a spammer at an inconvenient time certainly created a stress reaction in her, which probably translated into stress reaction at their dinner table etc... I made the social cost of spamming high by phoning their 1800 number (costs them $0.05/minute). I made it real, I humanized my email address by "calling them on it" and complaining about their practices. (they still spam)...
Filtering is huge, but ultimately we need to call peopel to social responsiblity, and that requires one of two approaches that I can see.
1. Grassroots community accountabiltiy/reaction to spam
2. Top down legislative control.
Its a war, but the war isn't for or against SPAM, the war is for and against respecting others on the NET.
Greg.
Email Obfuscation (Score:4, Interesting)
My theory is that harvesters have enough email addresses out there to gather and that the spammers are too lazy/have no need to write algorithms that interpret these types of mailtos.
Re:Decoy address to build a spammer blacklist (Score:2, Interesting)
Any email that arrives at the 'decoy' address is parsed, and the sender added to a blacklist.
This does not work, for the simple reason that nowadays, spam machines virtually always use a different sender (and very probably different sending IP address etc., given bots) for each mail.
use: SPAM as your username (Score:5, Interesting)
I have found that using SPAM as your username works wonders
just post it right there on the webpage or leave it as a mailto:spam@example.com [mailto]
So many people use NOSPAMjohn@NOSPAMexample.com (remove the NOSPAM to reply)
or some variation of that, I tried using spam@example.com as my email address on Google Groups and previously on Usenet.
I got pretty much nothing. No spam. Not then, not now.
Since the email harvesters apparently filter out variations of addresses with SPAM, NOSPAM, DIESPAMMERS etc in them, once they filter out the "SPAM" part of spam@example.com they are left with @example.com which is not a valid email address.
Re:Make people think to figure out your e-mail (Score:2, Interesting)
unless pressed to do so, because I assume it is itself a harvester of sorts, meaning I do
not trust companies who say that they will not resell my information.
Also, please do not use javascript, since many people (including myself) browse with
javascript off, and only enable it in tabs where it is absolutely necessary. I hate the
bother of turning on javascript. Please avoid it if at all possible. Granted, I would love
for all the web to go back to HTML 1.0 days - it looked good and was easy to read - but
even less conservative people probably hate javascript widgets which are not needed.
My favorite solutions: either use a slightly scrambled image or spell things like dot and
at so the text would not look like an email. You can also replace just the dots and ats
with images. Please, please, please, do not use forms, javascript or anything dynamic.
Re:disallow Windows users (Score:5, Interesting)
Har har.
Anyway, I did an experiment once years ago where I created a brand new mail account and turned off 'spam armor plating' (or whatever it's called) on Slashdot. Then I went about making my posts etc. To my surprise, I started getting messages rather quickly. It didn't take more than a week or two to start recieving enough unsolicited mail to shut the experiment down.
Fast forward to last year. I told a coworker friend about this. He didn't believe me. So I tried the experiment again and... uh.. actually I only got one or two messages over a period of two weeks. I'm not really sure what happened. It's as if they gave up on Slashdot.
I cannot draw any real solid conclusions from these experiments other than to say that yes, email addresses on websites do get harvested. Yes, you could disallow Windows users, but that wouldn't do a thing to protect any other user. The only possible way that would work is if spam harvesting apps ONLY happened on Windows machines, and let's be realistic, there's nothing to prevent that software from making its way to Linux etc. Once it gets harvested, it doesn't matter which OS you run, you can get spam just as easily.
It's a tough problem with no single solution.
Re:You can't have your cake an eat it too ... (Score:5, Interesting)
I think you hit the nail on the head. Strictly speaking, if you want to use text and don't leave a plain text version of your e-mail, you are at risk of being inaccessible.
I made a contact form for my site to avoid harvesters. While spammers do have scripts to submit contact forms, it's easier to trick a robot based on it's form input than based on what the robot can parse from the page (e.g. put a hidden field called phone number and fail the form on the backend if it has a value since most spam bots will try to enter something, and make sure there is an HTTP_REFERER, or ask for the user to duplicate some text in a field that is on the page somewhere else).
Re:Make people think to figure out your e-mail (Score:3, Interesting)
I wonder, then, if adding the word 'dot' to your e-mail address would deter bots. Probably not, though. They'd probably just try all permutations of '.' and 'dot'.
Use Javascript (Score:4, Interesting)
We use Javascript. You don't want to make life more difficult for the person trying to correspond - the point is to raise the cost to the spammer. If they have to add a Javascript parser to their spider, it's going to slow them way down. It's not going to make financial sense for them to do a custom solution for each site (and if they do, the "image" methods will break down as well).
When someone writes to me and says "reply to joe at gmail dot com" (or whatever), they generally don't get a reply. Why is their time more valuable than mine?
Re:You can't have your cake an eat it too ... (Score:3, Interesting)
Take a form putting the email alias in the table, and write a simple HTML form control that clicking the submit button takes the text on the page ("example") and appends the '@' sign and the domain ("example.com") in a two-step process, and spits out a "mailto:" link as the final step.
From the user's perspective, you get a little box that has your mailID and an 'Email me!' button right next to it. When they click the button, their mail client pops up and they can get straight to business. Because the address is stored in three-four chunks in the page code, the harvester isn't going to assemble it. Seems to me like that should be fairly effective.
Re:use a Table! (Score:1, Interesting)
Check my article then... (Score:2, Interesting)
http://www.thany.org/article/73/E-mail_hiding [thany.org]
Re:use: SPAM as your username (Score:1, Interesting)
Re:Make people think to figure out your e-mail (Score:3, Interesting)
Re:Use Javascript (Score:3, Interesting)
I just dont get it (Score:3, Interesting)
http://www.monkeys.com/wpoison/ [monkeys.com]
http://www.spampoison.com/ [spampoison.com]
Re:Boxtrapper (Score:3, Interesting)
That's called a challenge-response system.
Those are EVIL and should be banned from the Internet.
My personal domain has been hijacked by spammers. Despite having a valid SPF record, they still send spam with my domain forged as the sender. Consequently, when someone has a challenge-response spam filter configured, those challenge message come to ME, despite the fact that I had nothing to do with the original message. I consider those challenge messages spam themselves, and report them to spamcop as such.
There are better ways of filtering spam. Forcing other people to filter your mail for you is extremely inconsiderate.
Doesn't matter much really (Score:2, Interesting)
Like most of the people, I use multiple mail ids for different uses. Lots of them are fakes just to register to sites and such, and a couple are private ones which are used only to correspond with the closest friends and family members. Recently one of my friends told me that he has used my address to register for a gaming site since his was already being used for one account and apparently creating a new id takes ages and he may die before he gets a new one so why not use mine which is totally personal to me but who gives a damn. He actually has no idea why he should Not be doing it. And he is a CS major from the one of the best colleges in the country! Now think of the regular users you may have corresponded to and how easy it is for them to fuck everything trick you have tried to evade harvester bots.
Re:Accessibility (Score:2, Interesting)
Re:Make people think to figure out your e-mail (Score:3, Interesting)
seems to work well.. I mean have you ever seen someone submit a multi part form in under 10 sec?
Re:use a Table! (Score:4, Interesting)
In the right column, create an e-mail address that is missing the first letter or more of the actual e-mail address. Put the missing letters in the left column.
For example, if your e-mail address is "jack@example.com", "ja" would go in the left column and "ck@example.com" in the right column.
Then
Re:Make people think to figure out your e-mail (Score:3, Interesting)
And, for all practical purposes the fear of harvested mail addresses is silly, irrational and stupid. There is a very good method of dealing with harvesters. You combine greylisting with spambait driven blacklists and you get 99% of them right away.
Note - it is essential to use both grey and black in order for it to work. Using greylists allows to defer all mail until the spammer has fired its entire volley. If one of the addresses in the volley is a spambait you blacklist the source IP with a dynamic entry for let's say 24 hours and simulate that you are still greylisting. As a result the spammer does not know which addresses are bait and cannot prune its database. When (and if) the spammer comes around for a queue rerun you tell him to buzz off.
My email address is all over the internet from posts to mailing lists and such and it has been harvested thousands of times. If I do not use any server side antispam I get around 300+ SPAMs a day. After using grey+black+sorbs I get on the average under 2-3 spams a day. All I need to do to maintain the scheme, is to add some spambait from time to time here and there as well as pick up potential spambait from mail bounces. Most harvesters are badly written and will pick up Message-IDs as valid email addresses. These will bounce so picking them out of the error log and adding to the spamtrap triggers is a good way to populate it right away.
Works a treat : http://www.sigsegv.cx/exim-greylist-4.html [sigsegv.cx]
Re:Make people think to figure out your e-mail (Score:3, Interesting)
Everybody's talking about the weather ... (Score:2, Interesting)
1. Create 10 new email addresses, and post them around the net with 10 obfuscation tricks (plenty of examples can be found in this thread). Which of these tricks actually foiled the spammers, and which did not? Obviously, spammers can theoretically get around any obfuscation, but which obfuscations are still "safe"?
2. Do an experiment to figure how how "safer" is an address that was never posted on the Web. Does it just cause a small delay in spam (say, you only start getting spam after a month) or does it get noticably less spam?
The answer to #2 isn't as obvious as some may think. One important problem to consider is spamming worms which use fake "from" addresses. These worms take your friends' email addresses - potentially addresses which have never been published - and use them as spam to random people. If a spammer also receives these mails, he gets a constant stream of real email addresses which were never published on the web. Another obvious issue is dictionary attacks, which are especially practical on large domains (e.g., gmail).
Unique address per visitor (Score:3, Interesting)
support-312321@example.com
Then set up a catch all on the first part of the address.
If you get any spam, just block out that one receiving address.
Obfuscating vs Training? (Score:2, Interesting)
Obfuscating emailaddresses on websites is one way of tackling the spam harvesters problem. Training filters by becoming somewhat of a spam-magnet is another way. The only problem herein lies in the differentiation between ham and spam. Spam is here and will be here for a long time to come because people do make (a lot of) money with it. SO you could say detecting it is more sensible compared to avoiding it.
I've been experimenting by adding an automatically generated code to my email adresses on my page (recipientDELIMcode@domain.ext). Spammers keep on sending me spam on these addresses, and i accept, and train my mailfilter this way. The only thing I have to do is add 'contaminated' email addresses to my shitlist once i've found spam being sent to it. As you might already have guessed... the shitlist is a simple forward to sa-learn.
Adding an auto whitelister based on my own address book (LDAP is sweet) tackles the problem of addressbook harvesters, mail from these sources will not be fed to sa-learn, even if the email address its received on is shitlisted.
A friend of mine, who listens to the name of 'the wanker who cant keep his antivir up to date'/Paul created the need for me implement this feature by becoming infected by a _addressbook_leechin_virus_
To receive even more spam to feed to my hungry sa-learn there's a set of email addresses on my site (>50% of all email addresses there are in hidden fields/autogen'd pages) which are passed thru to sa-learn by default.
I've also been thinking of combining the unique id email address with a database in which i store served (generated) email addresses and giving them a grace period of N mins. If i recieve an email within these N mins i assume this email was sent by a visitor on my site who clicked the mailto: link and the message is passed to my mailbox and the unique id generated email address is flagged as non-spam source. However.. if I recieve mail on that email address after the N mins i assume its a spam-run and feed it to sa-learn I'm not sure on ROI (code-time/overhead/extra dependencies serverside) with this technique because what i have now works well enough for me.
The downside is you can't give out your email address on things like a business card (lastname@domain.ext). A possible solution to this is replacing your email address with an URL like http://lastname.domain.ext/ [domain.ext] on which a mailto: refresh is generated with the unique id'ed email address. Or trusting the intelligence of the lean-mean-(and pretty well trained)-spamkilling-machine, which is good enough for me.
My 2ct.