Webmasters Pounce On Wiki Sandboxes 324
Yacoubean writes "Wiki sandboxes are normally used to learn the syntax of wiki posts. But
webmasters may soon deluge these handy tools with links back to their site, not to get clicks, but to increase Google page rank. One such webmaster recently demonstrated this successfully. Isn't it time for Google finally to put some work into refining their results to exclude tricks like this? I know all the bloggers and wiki maintainers would sure appreciate it."
Why just wikis? (Score:5, Insightful)
Re:Why just wikis? (Score:5, Funny)
You forgot the link: Litigious Bastards [sco.com]
Re:Why just wikis? (Score:5, Funny)
That's just irresponsible. By putting that link there (the one that says Litigious Bastards [sco.com]), you're contributing to the problem.
Again, responsible people do not put "Litigious Bastards [sco.com]" links in their slashdot posts.
Think about it? How would you like a google search for Litigious Bastards [sco.com] to point to your company, leading everyone to think that you and your co-workers are nothing but a bunch of Litigious Bastards [sco.com]?
Grow up (Score:5, Funny)
"Dumb fucker", "miserable failure", etc
Re:Grow up (Score:5, Insightful)
Re:Why just wikis? (Score:5, Informative)
Then again maybe that mostly says something about their popularity.
Re:Why just wikis? (Score:3, Funny)
SCO UNIX® is a Proven, Stable and Reliable Platform
SCO UNIX® is backed by a single, experienced vendor
SCO UNIX® has a Committed, Well-Defined Roadmap
SCO UNIX® is Secure
SCO UNIX® is Legally Unencumbered
HAHAHAHAHAAHHAHAHAHAHAHAHA
That should be a top 10 list, and on letterman's show
Re:Why just wikis? (Score:3, Interesting)
posting on Wikis doesn't screw up your own blog.
posts on message boards will be deleted quickly, unless the board is expressly google bombing (as in the current Nigritude Ultramarine 1st placer [google.com]) / people are stupid
i think the idea is that wikis make it easier in general for your post to stay up and not affect your blog.
Re:Why just wikis? (Score:5, Informative)
My wiki got hit by this stupid link, but not in the sandbox. Of course, recovering the previous version of the page is easy... it's wiping out any trace of the lameness that gets trickier. I suppose the easiest way to defeat this would be to require simple registration in order to edit Wiki pages.
What else can we do? Alter the names of the submit buttons and some of the other key strings involved in Editing?
Re:Why just wikis? (Score:4, Informative)
It has probably already been done in any wiki software worth its salt. Here's what MoinMoin [wikiwikiweb.de] does for example:
* It has a regexp of HTTP_USER_AGENTS which should receive a FORBIDDEN for anything except viewing a page. The default setting includes many known bots (including Google) and utilities such as wget.
* Most pages contain the appropriate robot meta tag, whith the relevant noindex and/or nofollow settings.
In addition to that, the webmaster can of course set up a robots.txt file, and actually should do so because there are tools out there which don't understand the robot meta tags (or they don't want to take a performance hit) and the user agent of which can easily be changed by the user... wget comes to mind.
Of course, it shouldn't be too hard to add regexps to prevent certain links from being done, or certain hostnames or IPs from altering the site (editing pages, reverting them, deleting them).
Re:Why just wikis? (Score:5, Interesting)
And of course there are still sites that list EVERY referer in their logs somewhere on their site, so spammers have been adding their site URLs to their bot's user agent string. It's amazing the lengths these people will go to spam google.
Sure hope they can find a nice, elegant solution to this.
Re:Why just wikis? (Score:3, Insightful)
I'm not sure this will make you feel better but this startergy has a limited lifetime.
The contribution of your page to another pages page rank depends on two factors, firstly the page rank of your page, and secondly the number of links coming from your page.
As more people take up this tactic the return everyone gets from it, gets smaller. E.g. When there are hundred of links on that page they cease to have any real value. Eventually people should give up on this one.
visual security code for sign-up (Score:5, Informative)
Re:visual security code for sign-up (Score:5, Insightful)
There was a story about defeating this system on /. a while back.
Rather than using OCR or anything poeople would merely harvest a load of images from a signup site - possible when there are only a given number of finite images, or when there is a consistent naming policy.
Then once the images were collected they would merely setup an online porn site, asking people to join for free proving they were human by decoding the very images they had downloaded.
Human lust for porn meant that they could decode a large number of these images in a very short space of time, then return and mount a dictionary attack...
Quite clever really, sidestepping all the tricky obfuscation/OCR problems by tricking humans into doing their work for them ..
Re:visual security code for sign-up (Score:3, Informative)
You'd be pretty lucky to hit the exact same image twice.
Which is why I thought it was real time (Score:4, Interesting)
To avoid the timing problems with porn signons needing to happen concurrent with account signups, the account generation process was actually initiated by a porn signon. It limits your account generation ability, but only to the extent that you have porn traffic.
Did I just imagine this, or does it work that way?
Re:Which is why I thought it was real time (Score:4, Informative)
Re: (Score:3, Informative)
Re:Why just wikis? (Score:2, Informative)
Re:Why just wikis? (Score:5, Funny)
Why not normal discussion boards and blogs?
As an employee of JBOSS [jboss.org], I'm shocked and appalled at your suggestion. Fortunately, JBOSS [jboss.org] is working on a new JBOSS [jboss.org] solution to overcome this problem using JBOSS [jboss.org]. We at JBOSS [jboss.org] are passionate that our JBOSS [jboss.org] technology will prevent even non- JBOSS [jboss.org] users from taking advantage of boards this way.
Frank Lee Awnist
JBOSS [jboss.org] Employee
JBOSS [jboss.org] Inc.
JBOSS [jboss.org] JBOSS [jboss.org] JBOSS [jboss.org]
Re:Why just wikis? (Score:5, Funny)
Re:Why just wikis? (Score:2)
Cyberneighborhood Not-Watch? (Score:5, Interesting)
Perhaps there could be a command in the robots.txt file which says "Browse my site, but don't count any links here for page ranking"? That would make your site less of a target for spammers, but not prevent you from being ranked at all.
Re:Cyberneighborhood Not-Watch? (Score:3, Insightful)
Re:Cyberneighborhood Not-Watch? (Score:2, Informative)
Re:Cyberneighborhood Not-Watch? (Score:5, Interesting)
Re:Cyberneighborhood Not-Watch? (Score:5, Informative)
http://www.robotstxt.org/wc/meta-user.html
Re:Cyberneighborhood Not-Watch? (Score:3, Insightful)
Re:Cyberneighborhood Not-Watch? (Score:3, Insightful)
That is, even if you make your links useless (easy with a no-follow meta tag) it wont help, the majority of this spam is AUTOMATED, and will spam your wiki/blog/guestbook based on simple page queues.
Your best personal defense is to manually remove any page or html queues that a spammer would pick up on as being common to a certain type of postable web page or element.
Bloggers have been creating blacklists (banning both poster ips and destination urls) with some degree
Re:Cyberneighborhood Not-Watch? (Score:2)
just like spam (Score:3, Insightful)
Oh well (Score:5, Informative)
Yes... PLEASE... (Score:5, Insightful)
What happened to the nice internet we had in 1996?
Re:Yes... PLEASE... (Score:2)
Mine has always been set to not allow anon comments, but I know most people have that set as well.
I have been using MovableType and just haven't really had any problems. Been lucky I guess.
Re:Yes... PLEASE... (Score:2)
Unfortunately, my spam comments fill in the email fields, so I can't turn of anonymous comments. Is there any way for me to get the IP addresses of spam comments and forward this to the authorities?
Sure, that will work (Score:2)
Re:Yes... PLEASE... (Score:2)
What authorities would you be sending them to? It isn't really "illegal" to spam someone's comments, at least, not that I know of.
Re:Yes... PLEASE... (Score:3, Interesting)
Re:Yes... PLEASE... (Score:2)
Re:Yes... PLEASE... (Score:3, Interesting)
Re:Yes... PLEASE... (Score:5, Interesting)
naked women are trash? i'll take all you got (Score:4, Funny)
Re:Yes... PLEASE... (Score:3, Funny)
i blame blogs
Re:Yes... PLEASE... (Score:3, Insightful)
Re:Yes... PLEASE... (Score:2)
Re:Yes... PLEASE... (Score:2)
like porn (Score:5, Interesting)
Kind playing the system with the content not being quite as desirable.
You know... (Score:3, Insightful)
Re:You know... (Score:4, Insightful)
Re:You know... (Score:4, Insightful)
Re:You know... (Score:2)
"What's to keep Google-bombers from marking down the significance of real links in order to increase the rank of their links?"
One way to mitigate it is simply to let a given IP address mark a link as good or bad only once. The bomber would have to use a multitude of IP addresses in order to make any significant counter to the huge number of legitimate users that would be marking them down. It would be too labor intensive and therefore cost prohibitive.
Re:You know... (Score:3, Insightful)
It just might work! (Score:5, Funny)
Yes! Genius! That's it! Google needs some kind of system of rating results to modify future results returned--a system of 'mods' if you will.
Of course some people will 'mod' stuff down just because they don't like the viewpoint expressed, or they're in a perennial bad mood because their favorite operating system is dead, so we'll need to have a system of allowing people to rate the moderations--'meta-mod' if I may be so bold.
It sounds crazy, I know, but I think we could do this.
< jab jab > (Score:2, Interesting)
Some people ... (Score:2, Insightful)
Yes its a sandbox, no its not your personal playground.
google works (Score:4, Informative)
Who's fault is that? (Score:5, Insightful)
Some search engines accept any old site. Others accept sites based on human approval and categorization. Google is a nice combination of the two - by using outside references (counting how often the site is linked) it assumes that the site is more relevant. Because other people have put links on their sites. That's a human factor, without directly using human beings to review and categorize the sites and rankings.
Sure it can be abused, but it's not Google's fault; perhaps these areas of abuse (blogs, wikis, etc.) should address the problems from their end.
Re:Who's fault is that? (Score:2)
Re:Who's fault is that? (Score:2)
1. The problem still exists on the side of the provider with the links. Who coordinated these million links that resulted in the "Google bomb?" Why not complain to them?
2. Is it really a problem? Google has no public responsibility to report rankings according to the demand of anyone; if they wish to block Linux altogether and replace Linus/OSS searches with Microsoft-sponsored results, they can do so. But it would hurt their business and credibility. I'm confused as to why people think th
Re:Who's fault is that? (Score:2)
Re:Who's fault is that? (Score:3, Interesting)
I'm not even convinced Google's algorithm has a problem. One thing a lot of people don't realize about the page rank algorithm is that your page rank goes down if you have lots of outgoing links that aren't reciprocated with links coming back from the site you linked to. It may be that this technique simply leads to a reduction in the page rank of the sandbox, which, after all, is approp
ROBOTS.TXT (Score:5, Insightful)
As a sidenote, I think that with recent Wiki abuse, the issue of open wikis will become a similar one to open proxies and mail relays.
Re:ROBOTS.TXT (Score:2)
First of all, while my wiki is mostly personal junk, there's no reason it shouldn't be indexed. And many open source projects use Wikis as a primary source of documentation.
Secondly, the cat is out of the bag; I doubt these spammers are checking whether the sandboxes are indexed by Google.
I'm mostly pissed off that the edits to my sandbox have been only from nigritude ultramarine [slashdot.org] people. Frankly, I think google should stomp on that contest by not allowing the words to be sea
Re:ROBOTS.TXT (Score:2)
Bah! The source of spam is not email. The source if this problem is not the sandbox, it's the wikispammers. I watch the Sandbox page like any other. Moreover, the Sandbox's history is kept, just like any other page, so the spam is still successful in creating links even if it's removed.
Same site, a few days later: Don't do it. (Score:2, Insightful)
I decided to stop posting backlinks in Wiki sandboxes, the SEO strategy previously explained. [...] In the meantime I'm asking developers and those hosting Wikis of their own to please exclude sandboxes from search engine results (via the robots.txt file). Doing so would shield the sandbox from backlink-postings, and there is no need for it to turn up in search results in the first place.
This sure makes sense, and who knows, maybe future wiki distributions do it by defau
Complacency (Score:5, Interesting)
It was time to do that at least a year ago. It's pretty much impossible to find good information on any popular consumer product and this is a problem that's been around for a long time.
But they're too busy making an email application with 9 frames and 200k of Javascript to pay attention to the reason people use them in the first place. It's a little disappointing, I'm an AltaVista alumni and I got to watch them forget about search and do a bunch of useless crap instead, then die. I was hoping Google would be different.
Re:Complacency (Score:2)
Because, of course, if they weren't doing that, every last one of the engineers on that project would be tinkering with the search engine instead. It's not like they have separate engineering teams or people with different areas of expertise there or anything.
Re:Complacency (Score:2)
Please, if you're going to complain, give a concrete example of the search terms you're using, and what results you're expecting. I haven't had any trouble finding what I want on Google in the years I've been using it.
Well, it's about time this gets some attention (Score:5, Insightful)
From what I can see, it looks like those "search ranking professionals" who "guarantee to raise your google rank in 30 days" are using blog spamming, and perhaps Wiki Spamming as a way to increase their clients ratings.
It's not about meta tags, or submitting anymore... it's spamming.
Perhaps it's time for people to finally be warry of these services. After all, can a third party really guarantee a position in another companies search index?
IMHO those services are pure evil. They either do nothing, or they do something to increase page rank... what is that "something"? How many options do they have?
If they are going to use my blog... why can't I get a cut in that business?
Re:Well, it's about time this gets some attention (Score:5, Insightful)
Re:Well, it's about time this gets some attention (Score:2)
Is it still just "annoying?"
Re:Well, it's about time this gets some attention (Score:2)
This happened to me (Score:5, Interesting)
It's a bit of an administrative burden, but stopped people messing up our Wiki with irrelevant links to some site in China.
John.
I've seen this (Score:4, Informative)
This may become a big problem for sites like this. The only solution might be one of those annoying "write down the letters in this generated gif" humanity tests.
apache + search + p2p = distributed search engine (Score:2, Insightful)
This way all the modificed web servers would make a giant distributed search engine.
Some nice algorithms like koorde or kademlia could be used.
Anyone thought about starting something like this?
David
Re:apache + search + p2p = distributed search engi (Score:2)
We looked into something a lot like what you suggest [ibm.com] (and actually have it up and running inside our intranet with 2k or so users). The problem with doing this on the internet is that p2p technique
Google. (Score:4, Interesting)
When I do search in the first category, especially for things such as wallpaper, or simpsons audio clips, the sites that usually turn up are the least coherent ones with dozens of ads. I usually have to dig four or five pages to find a relevant one.
The people with these sites are playing hardball. Google wants them on their side, though, because they often display Google text ads.
Right now, my domain of choice is owned by a squatter that says "here are the results for your search" with a bunch of Google text ads. I was going to/may still put a site there that is very interesting, and the name was a key part of it.
I firmly believe that advertisements are the plague of the Internet. I would like to see sites selling their own products to fund themselves. Google doesn't really help in this regard. The text ads are less annoying than banner ads, but only slightly less annoying.
Don't get me wrong, I like Google. It's an invaluable tool when I'm doing research. I would just like to see them come out in full force against squatters.
Tomorrow today yesterday (Score:5, Insightful)
The Arch Wiki [gnuarch.org] has sufferred several times from such vandals in the past few months. I'm sure other wikis have, too. They create links over single spaces or dots, so that casual readers don't notice them. Attentively watching the RecentChanges page is the most effective way to find and fight them, but this is tiresome. I guess many wikis will require posters to be authenticated soon, which is a blow in the wiki ideal, but not such a major blow. Alternatively, maybe someone will develop heuristics to fight the most common abuses (e.g. external link over a single space).
So, this is not new, but this is now news.
Re:Tomorrow today yesterday (Score:2)
Not a big deal (Score:5, Informative)
Hmm (Score:4, Interesting)
This is a concern for the Google Gorilla? (Score:2, Interesting)
The problem with the whole Google model is that it's biased to begin with. If I'm looking for granny-smith apples, chances are an internet chimp they've bought the space with banana's to Google's goons. It becomes obvious when you see a chimp site that is near the
True (Score:4, Funny)
"Isn't it time for Google finally to put some work into refining their results to exclude tricks like this?"
I agree. I hope Google will finally put some work into refining their search results. I mean, they are probably the worst search engine ever! Now, Yahoo, MSN, Overture, Altavista... Those are much better. But Google?! Please...
Re:True (Score:2)
If Google just sits around then the competition will likely catch up.
Google may well downrate this (Score:2)
Sandbox persistence (Score:3, Insightful)
But if the problem is to have in websites areas where visitors (even unregistered ones) can post random text and links, even slashdot is potentially target of the same (maybe should be a "Spam" mod score?) or by the way, any site where unregistered visitors can store content in a way or another, be wiki or not.
"Finally"?? (Score:5, Interesting)
Isn't it time for Google finally to put some work into refining their results to exclude tricks like this?
I take extreme issue with that statement, and I'm surprised noone else has challenged it. Google does in fact put quite a bit of work into making themselves less vulnerable to these kinds of stunts. They even have a link on every results page where you can tell them if you got results you didn't expect, so they can hunt down the cause and refine their algorithm.
The system will never be perfect, and this is the latest issue that has not (yet) been dealt with. Quit your griping.
Re:"Finally"?? (Score:3, Informative)
I checked, and I've got documented evidence of this. On April 25 last year, I reported that earthlink.net was showing up as the top search result [perl.org] for queries involving various religious words, including "Bear Valley Bible Institute." The Church of Scientology (which owns Earthlink) was clearly engaging in something to distort the page rank of earthlink. I had noticed this for a long time before I recorded it.
On that same day, I reported the problem to Google via their feedback mechanism. I note today
Re:"Finally"?? (Score:2)
So at any rate, to sum up, I find the whining about Google "finally" doing something about this to be very unfair, since Google actively works on this kind of problem. It is disingenuous to dismiss their hard work and suggest that they have done nothing.
Why doesn't google (Score:2)
They are cleary different kinds of searches, and I do both of them, yet I get the same results for both kinds of searches. With the exception for froogle, which is definitely a step in the right direction, but not quite there.
Although the interface has gotten a little better on altavista (remember them??), but searches like: for used condoms [altavista.com] do not make sense for retail stores at a
Easy solution (Score:3, Insightful)
That's very interesting KEYWORD (Score:2, Funny)
Sig
--
KEY PHRASE <A HREF=www.my_website.com> KEYWORD KEYWORD KEYWORD <\A>
image based spam control (Score:4, Interesting)
So, every time you edit/post comment, you would be presented with an image with a random distorted text, which you will have to type in to be able to edit/post. That should take care of automated systems.
Re:image based spam control (Score:3, Insightful)
Maybe it wasn't obvious to blog and wiki programmers that the ability to post a comment or edit a wiki page was worth money. It isn't worth a lot per post, but because these are online systems, they are very susceptible to bots that can post in huge volume. All of those posts together can alter a site's
Re:image based spam control (Score:3, Insightful)
Why not just show the picture of an object, like an apple or something, and ask the user to type in what it is? I mean, you could have a few hundred of these and it would be nearly impossible for an automated system to guess. (You have a few hundred different items, and like 5-10 images of each item.) I dunno, seems easier to me, but I don't write web software.
It's already been invented. (Score:4, Informative)
Here's Google's stance on the subject (boils down to you don't want it indexed, put in a damn robots.txt file) [google.com]
Hell, even Google News uses robots.txt [google.com]
Clean sandbox daily. (Score:3, Informative)
Chip H.
Another solution besides robots.txt (Score:3, Interesting)
Re:E2 (Score:2)
Re:Naughty behaviour (Score:3, Informative)
Any suggestions?
The only big one I know of right now is Nutch. It is an open source search engine that is in the later stages of development, but hasn't produced a large, usable site yet.
nutch.org [nutch.org]
Since it will be open source, you will be able to read the ranking algorithms and change/abuse them as you see fit.
This one http://search.mnogo.ru/ [mnogo.ru] is also available.