Become a fan of Slashdot on Facebook

Webmasters Pounce On Wiki Sandboxes 324

Posted by simoniker on Monday June 07, 2004 @12:53PM from the fold-spindle-mutilate dept.

Yacoubean writes "Wiki sandboxes are normally used to learn the syntax of wiki posts. But webmasters may soon deluge these handy tools with links back to their site, not to get clicks, but to increase Google page rank. One such webmaster recently demonstrated this successfully. Isn't it time for Google finally to put some work into refining their results to exclude tricks like this? I know all the bloggers and wiki maintainers would sure appreciate it."

This discussion has been archived. No new comments can be posted.

Webmasters Pounce On Wiki Sandboxes

Load All Comments

Search 324 Comments Log In/Create an Account

Comments Filter:

Why just wikis? (Score:5, Insightful)

by GillBates0 ( 664202 ) writes: on Monday June 07, 2004 @12:56PM (#9357476) Journal

Why not normal discussion boards and blogs? We, for one, saw how the SCO joke (litigious b'turds) managed to GoogleBomb SCO in first place without a problem.

Share
twitter facebook
- Re:Why just wikis? (Score:5, Funny)
  
  by caino59 ( 313096 ) writes: on Monday June 07, 2004 @01:10PM (#9357622) Homepage
  
  We, for one, saw how the SCO joke (litigious b'turds) managed to GoogleBomb SCO in first place without a problem.
  
  You forgot the link: Litigious Bastards [sco.com]
  
  Parent Share
  twitter facebook
  - Re:Why just wikis? (Score:5, Funny)
    
    by clarkcox3 ( 194009 ) writes: <slashdot@clarkcox.com> on Monday June 07, 2004 @01:45PM (#9357948) Homepage
    
    That's just irresponsible. By putting that link there (the one that says Litigious Bastards [sco.com]), you're contributing to the problem.
    
    Again, responsible people do not put "Litigious Bastards [sco.com]" links in their slashdot posts.
    
    Think about it? How would you like a google search for Litigious Bastards [sco.com] to point to your company, leading everyone to think that you and your co-workers are nothing but a bunch of Litigious Bastards [sco.com]?
    
    Parent Share
    twitter facebook
    - Grow up (Score:5, Funny)
      
      by scrytch ( 9198 ) writes: <chuck@myrealbox.com> on Monday June 07, 2004 @02:25PM (#9358342)
      
      You know, googlebombing might have some better effect if you did it in reverse, e.g. SCO [litigousbastards.com]. Right now the second link for "litigous bastards" after sco.com is ... a page urging people to googlebomb. Gee, how subversive, no one will figure out how that worked... Hell every time you mention SCO [pigfuckers.com] come up with a different link for SCO [daryls-wif...harges.com] so their google results will be peppered with such commentary after... People search for "SCO", not "litigous bastards".
      
      "Dumb fucker", "miserable failure", etc ... that was funny. Once. Get over it and take some real action against these, uh, litigous bastards, or at least improve the trick a little.
      
      Parent Share
      twitter facebook
      - Re:Grow up (Score:5, Insightful)
        
        by maxwell demon ( 590494 ) writes: on Monday June 07, 2004 @03:16PM (#9358869) Journal
        
        Well, why not link SCO [opensource.org] to something the reader gets real value from? Some page where they can learn something about SCO [iwethey.org]? After all, since those pages indeed tell something about SCO [osdl.org] and therefore contain the word SCO [gnu.org], it should even be more effective.
        
        Parent Share
        twitter facebook
    - Re:Why just wikis? (Score:5, Informative)
      
      by Eivind ( 15695 ) writes: <eivindorama@gmail.com> on Monday June 07, 2004 @02:33PM (#9358405) Homepage
      
      It's working almost *too* well. Not only are SCO the number one hit for "litigious bastards", but they're also the number one hit for "litigious" or "bastards" alone.
      Then again maybe that mostly says something about their popularity.
      
      Parent Share
      twitter facebook
  - Re:Why just wikis? (Score:3, Funny)
    
    by mrtroy ( 640746 ) writes:
    
    Top 5 reasons that unix > linux, according to SCO
    
    SCO UNIX® is a Proven, Stable and Reliable Platform
    SCO UNIX® is backed by a single, experienced vendor
    SCO UNIX® has a Committed, Well-Defined Roadmap
    SCO UNIX® is Secure
    SCO UNIX® is Legally Unencumbered
    
    HAHAHAHAHAAHHAHAHAHAHAHAHA
    
    That should be a top 10 list, and on letterman's show
- Re:Why just wikis? (Score:3, Interesting)
  
  by abscondment ( 672321 ) writes:
  
  posting on Wikis doesn't screw up your own blog.
  
  posts on message boards will be deleted quickly, unless the board is expressly google bombing (as in the current Nigritude Ultramarine 1st placer [google.com]) / people are stupid
  
  i think the idea is that wikis make it easier in general for your post to stay up and not affect your blog.
  - Re:Why just wikis? (Score:5, Informative)
    
    by ichimunki ( 194887 ) writes: on Monday June 07, 2004 @01:22PM (#9357735)
    
    The real problem with Wikis is that the link will remain there, even after it has been removed from the current page, because most Wikis have a revision history feature. So what's needed is careful set up in the robots.txt file and other HTML clues for the web crawlers to exclude anything but the most current version of a page (and to skip over the other 'action' pages, like edits, etc).
    
    My wiki got hit by this stupid link, but not in the sandbox. Of course, recovering the previous version of the page is easy... it's wiping out any trace of the lameness that gets trickier. I suppose the easiest way to defeat this would be to require simple registration in order to edit Wiki pages.
    
    What else can we do? Alter the names of the submit buttons and some of the other key strings involved in Editing?
    
    Parent Share
    twitter facebook
    - Re:Why just wikis? (Score:4, Informative)
      
      by boa13 ( 548222 ) writes: on Monday June 07, 2004 @02:10PM (#9358194) Homepage Journal
      
      So what's needed is careful set up in the robots.txt file and other HTML clues for the web crawlers to exclude anything but the most current version of a page (and to skip over the other 'action' pages, like edits, etc).
      
      It has probably already been done in any wiki software worth its salt. Here's what MoinMoin [wikiwikiweb.de] does for example:
      
      * It has a regexp of HTTP_USER_AGENTS which should receive a FORBIDDEN for anything except viewing a page. The default setting includes many known bots (including Google) and utilities such as wget.
      * Most pages contain the appropriate robot meta tag, whith the relevant noindex and/or nofollow settings.
      
      In addition to that, the webmaster can of course set up a robots.txt file, and actually should do so because there are tools out there which don't understand the robot meta tags (or they don't want to take a performance hit) and the user agent of which can easily be changed by the user... wget comes to mind.
      
      Of course, it shouldn't be too hard to add regexps to prevent certain links from being done, or certain hostnames or IPs from altering the site (editing pages, reverting them, deleting them).
      
      Parent Share
      twitter facebook
- Re:Why just wikis? (Score:5, Interesting)
  
  by nautical9 ( 469723 ) writes: on Monday June 07, 2004 @01:12PM (#9357641) Homepage
  
  I host my own little phpBB boards for friends and family, but it is open to the world. Recently I've noticed spammers registering users for the sole purpose of being included in the "member list", with a corresponding link back to whatever site they wish to promote. They'll never actually post anything, but they've obviously automated the sign-up procedure as I get a new member every day or so, and google will eventually find the member list link.
  And of course there are still sites that list EVERY referer in their logs somewhere on their site, so spammers have been adding their site URLs to their bot's user agent string. It's amazing the lengths these people will go to spam google.
  Sure hope they can find a nice, elegant solution to this.
  
  Parent Share
  twitter facebook
  - Re:Why just wikis? (Score:3, Insightful)
    
    by Andy Mitchell ( 780458 ) writes:
    
    I'm not sure this will make you feel better but this startergy has a limited lifetime.
    The contribution of your page to another pages page rank depends on two factors, firstly the page rank of your page, and secondly the number of links coming from your page.
    As more people take up this tactic the return everyone gets from it, gets smaller. E.g. When there are hundred of links on that page they cease to have any real value. Eventually people should give up on this one.
  - visual security code for sign-up (Score:5, Informative)
    
    by Saeed al-Sahaf ( 665390 ) writes: on Monday June 07, 2004 @01:26PM (#9357765) Homepage
    
    Most BB boards (including phpBB, upgrade!) and blogs (including Slashdot) now feature the visual security code for sign-up. But, of course, this does not prevent hand entry of spam...
    
    Parent Share
    twitter facebook
    - Re:visual security code for sign-up (Score:5, Insightful)
      
      by stevey ( 64018 ) writes: on Monday June 07, 2004 @01:33PM (#9357821) Homepage
      
      There was a story about defeating this system on /. a while back.
      
      Rather than using OCR or anything poeople would merely harvest a load of images from a signup site - possible when there are only a given number of finite images, or when there is a consistent naming policy.
      
      Then once the images were collected they would merely setup an online porn site, asking people to join for free proving they were human by decoding the very images they had downloaded.
      
      Human lust for porn meant that they could decode a large number of these images in a very short space of time, then return and mount a dictionary attack...
      
      Quite clever really, sidestepping all the tricky obfuscation/OCR problems by tricking humans into doing their work for them ..
      
      Parent Share
      twitter facebook
      - Re:visual security code for sign-up (Score:3, Informative)
        
        by Bitsy Boffin ( 110334 ) writes:
        
        Except that the images ("turing numbers" as they are often called) are dynamically generated from random character sequences, and probably with equally random distortions.
        
        You'd be pretty lucky to hit the exact same image twice.
        
        Which is why I thought it was real time (Score:4, Interesting)
        
        by swb ( 14022 ) writes: on Monday June 07, 2004 @02:58PM (#9358693)
        
        I thought it was a real-time thing, where the account creation bots passed the image that loaded during the signup process to a porn site and the images were decoded by a real person, and the result passed back to the bot who then signed up for the account.
        
        To avoid the timing problems with porn signons needing to happen concurrent with account signups, the account generation process was actually initiated by a porn signon. It limits your account generation ability, but only to the extent that you have porn traffic.
        
        Did I just imagine this, or does it work that way?
        
        Parent Share
        twitter facebook
        
        Re:Which is why I thought it was real time (Score:4, Informative)
        
        by allism ( 457899 ) writes: <alice.harrison@g ... com minus distro> on Monday June 07, 2004 @03:48PM (#9359188) Journal
        
        You didn't imagine it, but perhaps a clearer understanding of the technique can be achieved by reviewing the previous discussions. Here's a link to the Slashdot article [slashdot.org] that discussed this last January.
        
        Parent Share
        twitter facebook
    - - Re: (Score:3, Informative)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
  - Re:Why just wikis? (Score:2, Informative)
    
    by Anonymous Coward writes:
    
    Just set your robots.txt to exclude the user list. Or if you don't have many friends and family, send yourself an 'approve member' email. Then start training your spam filter on fake accounts.
- Re:Why just wikis? (Score:5, Funny)
  
  by Anonymous Coward writes: on Monday June 07, 2004 @01:16PM (#9357682)
  
  Why not normal discussion boards and blogs?
  
  As an employee of JBOSS [jboss.org], I'm shocked and appalled at your suggestion. Fortunately, JBOSS [jboss.org] is working on a new JBOSS [jboss.org] solution to overcome this problem using JBOSS [jboss.org]. We at JBOSS [jboss.org] are passionate that our JBOSS [jboss.org] technology will prevent even non- JBOSS [jboss.org] users from taking advantage of boards this way.
  
  Frank Lee Awnist
  JBOSS [jboss.org] Employee
  JBOSS [jboss.org] Inc.
  
  JBOSS [jboss.org] JBOSS [jboss.org] JBOSS [jboss.org]
  
  Parent Share
  twitter facebook
  - Re:Why just wikis? (Score:5, Funny)
    
    by Pieroxy ( 222434 ) writes: on Monday June 07, 2004 @01:24PM (#9357747) Homepage
    
    You forgot the link: JBOSS [jboss.org].
    
    Parent Share
    twitter facebook
- Re:Why just wikis? (Score:2)
  
  by athakur999 ( 44340 ) writes:
  
  Most wiki sandboxes will let you modify them without any sort of registration at all, so it's much more time effective than signing up for a bunch of discussion boards, waiting for the validation emails, etc. They also probably have a higher average page rank than most discussion boards and blogs would, so a little goes a long way.
Cyberneighborhood Not-Watch? (Score:5, Interesting)

by raehl ( 609729 ) * writes: <raehl311@y a h o o . com> on Monday June 07, 2004 @12:56PM (#9357478) Homepage

In the real world, there are neighborhood watch signs to "deter" criminals.

Perhaps there could be a command in the robots.txt file which says "Browse my site, but don't count any links here for page ranking"? That would make your site less of a target for spammers, but not prevent you from being ranked at all.

Share
twitter facebook
- Re:Cyberneighborhood Not-Watch? (Score:3, Insightful)
  
  by lunax ( 235701 ) writes:
  
  Why not put the sandbox in it's own folder and add an entry to the robots.txt telling it not to browse that folder?
  - Re:Cyberneighborhood Not-Watch? (Score:2, Informative)
    
    by Random Web Developer ( 776291 ) writes:
    
    The problem with wiki's is that they use 1 template for all pages, including the sandbox, everything is wiki.pl?PageName or something like that. You would have to dive in the code instead of just "using" the wiki
    - Re:Cyberneighborhood Not-Watch? (Score:5, Interesting)
      
      by phutureboy ( 70690 ) writes: on Monday June 07, 2004 @02:05PM (#9358146)
      
      You can also list robots.txt commands as meta tags in the [head] portion of the document. So, the wiki authors could just put them in the sandbox template, and individual site owners would not even have to know about / monkey with robots.txt to be protected.
      
      Parent Share
      twitter facebook
- Re:Cyberneighborhood Not-Watch? (Score:5, Informative)
  
  by Random Web Developer ( 776291 ) writes: on Monday June 07, 2004 @01:10PM (#9357623) Homepage
  
  There is a robots meta tag for this that you can put in your headers for a single page (robots.txt needs subdirs) but unfortunately most webmasters are too ignorant to realize the power of these:
  
  http://www.robotstxt.org/wc/meta-user.html
  
  Parent Share
  twitter facebook
  - Re:Cyberneighborhood Not-Watch? (Score:3, Insightful)
    
    by jacoplane ( 78110 ) writes:
    
    I think the real problem is that spammers aren't likely to look at how you've configured spiders to handle your site. So even if you do this i'm sure it won't get rid of the spammers.
- Re:Cyberneighborhood Not-Watch? (Score:3, Insightful)
  
  by naoiseo ( 313146 ) writes:
  
  This fails to address the real issue.
  
  That is, even if you make your links useless (easy with a no-follow meta tag) it wont help, the majority of this spam is AUTOMATED, and will spam your wiki/blog/guestbook based on simple page queues.
  
  Your best personal defense is to manually remove any page or html queues that a spammer would pick up on as being common to a certain type of postable web page or element.
  
  Bloggers have been creating blacklists (banning both poster ips and destination urls) with some degree
- Re:Cyberneighborhood Not-Watch? (Score:2)
  
  by Jeff DeMaagd ( 2015 ) writes:
  
  I think one quick, easy fix is to disallow hyperlinks in the comments / guest book. If it isn't an "a href" then Google's spider won't take it.
- just like spam (Score:3, Insightful)
  
  by SethJohnson ( 112166 ) writes:
  
  Your suggestion is well-thought-out, but is plagued by two problems.
  
  1. The bombing bots won't give a rat's ass if you add this to robots.txt. Just like spammers, there's not cost for them to hit your site anyway. Even if Google is instructed to ignore the links.
  
  2. Your site's google ranking is affected by the quality of the links you feature pointing at other sites. Your solution unbalances this whole matrix.
Oh well (Score:5, Informative)

by SpaceCadetTrav ( 641261 ) writes: on Monday June 07, 2004 @12:57PM (#9357485) Homepage

Google and others will just lower/diminish the value of links from Wiki pages, just like they did to those open "Guest Book" pages on personal sites.

Share
twitter facebook
Yes... PLEASE... (Score:5, Insightful)

by Paulrothrock ( 685079 ) writes: on Monday June 07, 2004 @12:57PM (#9357491) Homepage Journal

Google needs to do something about this. I had to turn off comments on my blog because all I was getting was spam. Two or three a day that I had to go in and delete. I have to now find a system that will keep the bots out.
What happened to the nice internet we had in 1996?

Share
twitter facebook
- Re:Yes... PLEASE... (Score:2)
  
  by ack154 ( 591432 ) * writes:
  
  I still haven't really seen a problem with this on my blog. I've had comments enabled for the past two years and have maybe gotten 3 or 4 total spam comments in that time (one today actually).
  
  Mine has always been set to not allow anon comments, but I know most people have that set as well.
  
  I have been using MovableType and just haven't really had any problems. Been lucky I guess.
  - Re:Yes... PLEASE... (Score:2)
    
    by Paulrothrock ( 685079 ) writes:
    
    I'm using Wordpress, and before that b2. It's only started in the past month, too.
    Unfortunately, my spam comments fill in the email fields, so I can't turn of anonymous comments. Is there any way for me to get the IP addresses of spam comments and forward this to the authorities?
    - Sure, that will work (Score:2)
      
      by Safety Cap ( 253500 ) writes:
      
      Because IP addresses can't be forged. Evar!
    - Re:Yes... PLEASE... (Score:2)
      
      by ack154 ( 591432 ) * writes:
      
      Well, aside from being able to forge IPs and such, my question to that would be...
      
      What authorities would you be sending them to? It isn't really "illegal" to spam someone's comments, at least, not that I know of.
- Re:Yes... PLEASE... (Score:3, Interesting)
  
  by lukewarmfusion ( 726141 ) writes:
  
  As my site grows, I'm thinking about adding a mechanism to address those issues: when the user requests a page for the first time, he'll get a session value that says he's a valid visitor to the site. When he submits a comment, he has to have that value, or comments aren't allowed. I don't know how you'd write a script to circumvent that. (If someone can tell me, I'd love to know so I try to prevent it!)
  - Re:Yes... PLEASE... (Score:2)
    
    by Nasarius ( 593729 ) writes:
    
    Well if you're setting a "session value", you're either using cookies or rewriting the links. So all that the script has to do is handle cookies properly or follow your "post a comment" links, neither of which is very hard.
  - Re:Yes... PLEASE... (Score:3, Interesting)
    
    by joggle ( 594025 ) writes:
    
    Why not generate an image containing modified text like yahoo and others? Using a little PHP magic, it shouldn't be too hard (see here [resourceindex.com] to get a start).
- Re:Yes... PLEASE... (Score:5, Interesting)
  
  by n-baxley ( 103975 ) writes: <(gro.syelxab) (ta) (etan)> on Monday June 07, 2004 @01:12PM (#9357638) Homepage Journal
  
  The system was even easier to rig back then. Back in 96ish, I created a web page with the title "Not Sexy Naked Women". Then repeated that phrase several times and then gave a message telling people to click the link below for more Hot Sexy Naked Women which took them to a page that admonished them for looking for such trash. I added a banner ad to the top of both of these pages, submitted them to a search engine and made $500 in a month! Things are better today, but they're still not perfect.
  
  Parent Share
  twitter facebook
  - naked women are trash? i'll take all you got (Score:4, Funny)
    
    by waspleg ( 316038 ) writes: on Monday June 07, 2004 @01:36PM (#9357848) Journal
    
    you know what they say about another man's garbage
    
    Parent Share
    twitter facebook
- Re:Yes... PLEASE... (Score:3, Funny)
  
  by happyfrogcow ( 708359 ) writes:
  
  What happened to the nice internet we had in 1996?
  
  i blame blogs
  - Re:Yes... PLEASE... (Score:3, Insightful)
    
    by Paulrothrock ( 685079 ) writes:
    
    No, I blame opportunistic bastards who can't see that it's okay to not profit from something. *Thinks about his sledding hill that was destroyed by an upscale minimall.*
- Re:Yes... PLEASE... (Score:2)
  
  by Safety Cap ( 253500 ) writes:
  
  I had to turn off comments on my blog because all I was getting was spam.
  
  The simple solution [godaddy.com] is to require the poster to read a distored graphic of a random numeric value and enter the value into a field in order to submit his message.
- - - Re:Yes... PLEASE... (Score:2)
      
      by Frizzle Fry ( 149026 ) writes:
      
      Based on your use of bold, you seem to be saying it's ironic that he couldn't spell illiterate, but equally ironic is that his screed against the "technically illeterate" is contained in an improperly closed italics tag.
like porn (Score:5, Interesting)

by millahtime ( 710421 ) writes: on Monday June 07, 2004 @12:58PM (#9357498) Homepage Journal

These seems similar to the system all those porn systems used to get such a high rank in google.

Kind playing the system with the content not being quite as desirable.

Share
twitter facebook
You know... (Score:3, Insightful)

by fizban ( 58094 ) writes: <fizban@umich.edu> on Monday June 07, 2004 @12:58PM (#9357505) Homepage

...what Google needs? A "Was this result helpful in your search?" button for each link returned, so that the search itself also influences page ranks. Maybe that will help get rid of this Google bombing mess.

Share
twitter facebook
- Re:You know... (Score:4, Insightful)
  
  by Anonymous Coward writes: on Monday June 07, 2004 @01:04PM (#9357565)
  
  that button will also get spammed, as bots will click 'yes' for their sites and 'no' for the competitors sites
  
  Parent Share
  twitter facebook
- Re:You know... (Score:4, Insightful)
  
  by goon america ( 536413 ) writes: on Monday June 07, 2004 @01:09PM (#9357612) Homepage Journal
  
  Wouldn't that be equally abused?
  
  Parent Share
  twitter facebook
  - Re:You know... (Score:2)
    
    by gunnk ( 463227 ) writes:
    
    I'm guessing that you are asking:
    
    "What's to keep Google-bombers from marking down the significance of real links in order to increase the rank of their links?"
    
    One way to mitigate it is simply to let a given IP address mark a link as good or bad only once. The bomber would have to use a multitude of IP addresses in order to make any significant counter to the huge number of legitimate users that would be marking them down. It would be too labor intensive and therefore cost prohibitive.
    - Re:You know... (Score:3, Insightful)
      
      by Nasarius ( 593729 ) writes:
      
      Ah, but how long will it take for someone to write a worm with a Google-abusing payload? We've already got spammers using hacked PCs to send mail.
- It just might work! (Score:5, Funny)
  
  by mcmonkey ( 96054 ) writes: on Monday June 07, 2004 @01:24PM (#9357749) Homepage
  
  'You know what Google needs? A "Was this result helpful in your search?" button for each link returned'
  
  Yes! Genius! That's it! Google needs some kind of system of rating results to modify future results returned--a system of 'mods' if you will.
  
  Of course some people will 'mod' stuff down just because they don't like the viewpoint expressed, or they're in a perennial bad mood because their favorite operating system is dead, so we'll need to have a system of allowing people to rate the moderations--'meta-mod' if I may be so bold.
  
  It sounds crazy, I know, but I think we could do this.
  
  Parent Share
  twitter facebook
< jab jab > (Score:2, Interesting)

by jx100 ( 453615 ) writes:

Well, couldn't have been that successful, for he didn't win [searchguild.com].
Some people ... (Score:2, Insightful)

by TheGavster ( 774657 ) writes:

It still gets me how the people who are participating in the nigritude ultramarine thing don't see anything wrong with what they're doing. This line particularly got me:

"Without, as opposed to guestbook spamming, being evil it's a sandbox after all."

Yes its a sandbox, no its not your personal playground.
google works (Score:4, Informative)

by mwheeler01 ( 625017 ) writes: <matthew.l.wheeler@NosPAm.gmail.com> on Monday June 07, 2004 @01:00PM (#9357518)

Google does tweak their ranking system on a regular basis. When the problem becomes evident, (and it looks like it just has) they do something about it...that's why they're google.

Share
twitter facebook
Who's fault is that? (Score:5, Insightful)

by lukewarmfusion ( 726141 ) writes: on Monday June 07, 2004 @01:00PM (#9357523) Homepage Journal

Google's algorithm isn't the problem. The problem is the availability of easily abused areas such as these "sandboxes."

Some search engines accept any old site. Others accept sites based on human approval and categorization. Google is a nice combination of the two - by using outside references (counting how often the site is linked) it assumes that the site is more relevant. Because other people have put links on their sites. That's a human factor, without directly using human beings to review and categorize the sites and rankings.

Sure it can be abused, but it's not Google's fault; perhaps these areas of abuse (blogs, wikis, etc.) should address the problems from their end.

Share
twitter facebook
- Re:Who's fault is that? (Score:2)
  
  by Chanc_Gorkon ( 94133 ) writes:
  
  Yes it is. When with less than a million links miserable failure searches on google are linked to President Bush's biography on the whitehouse web site, that's a problem (leave your political views out of this). Same geos for Weapon's of Mass Destruction and other google bombs. Google....fix it now before it gets to be a real problem.
  - Re:Who's fault is that? (Score:2)
    
    by lukewarmfusion ( 726141 ) writes:
    
    Two issues here:
    
    1. The problem still exists on the side of the provider with the links. Who coordinated these million links that resulted in the "Google bomb?" Why not complain to them?
    
    2. Is it really a problem? Google has no public responsibility to report rankings according to the demand of anyone; if they wish to block Linux altogether and replace Linus/OSS searches with Microsoft-sponsored results, they can do so. But it would hurt their business and credibility. I'm confused as to why people think th
    - Re:Who's fault is that? (Score:2)
      
      by Chanc_Gorkon ( 94133 ) writes:
      
      No wait....the Google algorithm has a hole. Does the presidents biography have the words miserbale failure in it? Why is the linked text taken as a meaning of what is on the site? Those webmasters who put the link all over thier sites are only taking advantage of a hole in the Google algorithm. Google should simply do a text search and make sure that miserable failure is actually ON the web page that that text links to. Then google bombs would have no effect.
- Re:Who's fault is that? (Score:3, Interesting)
  
  by bcrowell ( 177657 ) writes:
  
  Google's algorithm isn't the problem. The problem is the availability of easily abused areas such as these "sandboxes."
  I'm not even convinced Google's algorithm has a problem. One thing a lot of people don't realize about the page rank algorithm is that your page rank goes down if you have lots of outgoing links that aren't reciprocated with links coming back from the site you linked to. It may be that this technique simply leads to a reduction in the page rank of the sandbox, which, after all, is approp
ROBOTS.TXT (Score:5, Insightful)

by gtrubetskoy ( 734033 ) writes: on Monday June 07, 2004 @01:00PM (#9357524)

The burden is not on Google, but on Wiki sandbox admins, who should provide proper ROBOTS.TXT files to inform Google that this content should not be indexed.
As a sidenote, I think that with recent Wiki abuse, the issue of open wikis will become a similar one to open proxies and mail relays.

Share
twitter facebook
- Re:ROBOTS.TXT (Score:2)
  
  by sylvester ( 98418 ) writes:
  
  wtf. That's not insightful.
  
  First of all, while my wiki is mostly personal junk, there's no reason it shouldn't be indexed. And many open source projects use Wikis as a primary source of documentation.
  
  Secondly, the cat is out of the bag; I doubt these spammers are checking whether the sandboxes are indexed by Google.
  
  I'm mostly pissed off that the edits to my sandbox have been only from nigritude ultramarine [slashdot.org] people. Frankly, I think google should stomp on that contest by not allowing the words to be sea
  - - Re:ROBOTS.TXT (Score:2)
      
      by sylvester ( 98418 ) writes:
      
      The source of the problem are sandboxes. With most open source projects, spam in a wiki will be quickly spotted and gotten rid of, but in sandboxen junk can sit for months, long enough for google to make note of it.
      Bah! The source of spam is not email. The source if this problem is not the sandbox, it's the wikispammers. I watch the Sandbox page like any other. Moreover, the Sandbox's history is kept, just like any other page, so the spam is still successful in creating links even if it's removed.
      There
Same site, a few days later: Don't do it. (Score:2, Insightful)

by micha2305 ( 769447 ) writes:

Ok, but the same webmaster says [outer-court.com]:
I decided to stop posting backlinks in Wiki sandboxes, the SEO strategy previously explained. [...] In the meantime I'm asking developers and those hosting Wikis of their own to please exclude sandboxes from search engine results (via the robots.txt file). Doing so would shield the sandbox from backlink-postings, and there is no need for it to turn up in search results in the first place.
This sure makes sense, and who knows, maybe future wiki distributions do it by defau
Complacency (Score:5, Interesting)

by faust2097 ( 137829 ) writes: on Monday June 07, 2004 @01:01PM (#9357534)

Isn't it time for Google finally to put some work into refining their results to exclude tricks like this?

It was time to do that at least a year ago. It's pretty much impossible to find good information on any popular consumer product and this is a problem that's been around for a long time.

But they're too busy making an email application with 9 frames and 200k of Javascript to pay attention to the reason people use them in the first place. It's a little disappointing, I'm an AltaVista alumni and I got to watch them forget about search and do a bunch of useless crap instead, then die. I was hoping Google would be different.

Share
twitter facebook
- Re:Complacency (Score:2)
  
  by koreth ( 409849 ) writes:
  
  But they're too busy making an email application with 9 frames and 200k of Javascript
  
  Because, of course, if they weren't doing that, every last one of the engineers on that project would be tinkering with the search engine instead. It's not like they have separate engineering teams or people with different areas of expertise there or anything.
- Re:Complacency (Score:2)
  
  by Carnildo ( 712617 ) writes:
  
  It was time to do that at least a year ago. It's pretty much impossible to find good information on any popular consumer product and this is a problem that's been around for a long time.
  
  Please, if you're going to complain, give a concrete example of the search terms you're using, and what results you're expecting. I haven't had any trouble finding what I want on Google in the years I've been using it.
Well, it's about time this gets some attention (Score:5, Insightful)

by digitalgimpus ( 468277 ) writes: on Monday June 07, 2004 @01:01PM (#9357536) Homepage

I've noticed that my blog's getting lots of spam from sites that don't seem like typical spam sites....

From what I can see, it looks like those "search ranking professionals" who "guarantee to raise your google rank in 30 days" are using blog spamming, and perhaps Wiki Spamming as a way to increase their clients ratings.

It's not about meta tags, or submitting anymore... it's spamming.

Perhaps it's time for people to finally be warry of these services. After all, can a third party really guarantee a position in another companies search index?

IMHO those services are pure evil. They either do nothing, or they do something to increase page rank... what is that "something"? How many options do they have?

If they are going to use my blog... why can't I get a cut in that business?

Share
twitter facebook
- Re:Well, it's about time this gets some attention (Score:5, Insightful)
  
  by Lurker McLurker ( 730170 ) writes: <allthecoolnameshavegone AT gmail DOT com> on Monday June 07, 2004 @01:09PM (#9357610)
  
  IMHO those services are pure evil.
  
  No, 9/11 was pure evil, some unwanted comments on a blog is an annoyance. If you have a website that allows anyone to post comments, you will get some you don't like. That's life.
  
  Parent Share
  twitter facebook
  - Re:Well, it's about time this gets some attention (Score:2)
    
    by Henry Stern ( 30869 ) writes:
    
    I beg to differ with you on the matter of it being only "an annoyance." I've had to delete comments on my own weblog that (supposedly) link to underage pornography sites. I'm not a lawyer, but I'm fairly certain that it is illegal to link to child pornography. Assuming that this is true, those SEOs are actually causing you, the innocent weblog/wiki owner, to unwillingly and unwittingly commit a criminal act.
    
    Is it still just "annoying?"
- Re:Well, it's about time this gets some attention (Score:2)
  
  by 87C751 ( 205250 ) writes:
  
  I've noticed that my blog's getting lots of spam from sites that don't seem like typical spam sites....
  
  I had a spate of comment spamming too, about a month ago. In fact, that was what inspired me to move from blogware (WordPress) to a full-up CMS (PostNuke). The comment spammers' scripts don't seem to have found PostNuke yet. By the time they do, I'll have anti-bot measures in place (if I haven't simply closed comments to unregistered users).
This happened to me (Score:5, Interesting)

by JohnGrahamCumming ( 684871 ) * writes: <slashdot.jgc@org> on Monday June 07, 2004 @01:02PM (#9357539) Homepage Journal

This happened on the POPFile Wiki [sourceforge.net]. Eventually I solved it by changing the code of the Wiki itself to have an allowed list of URLs (actually a set of regexps). If someone adds a page which uses a new URL that isn't covered it wont show up when the page is displayed and the user has to email me to get that specific URL added.

It's a bit of an administrative burden, but stopped people messing up our Wiki with irrelevant links to some site in China.

John.

Share
twitter facebook
I've seen this (Score:4, Informative)

by goon america ( 536413 ) writes: on Monday June 07, 2004 @01:04PM (#9357559) Homepage Journal

I just reverted some pages on my watch list on Wikipedia that had been edited with a google spam bot to link all sorts of words back to its mother site.... lots of mistakes, looked like the script they were using hadn't been tested that well yet. (Would post an example, but wikipedia is completely fuxx0red at the moment).
This may become a big problem for sites like this. The only solution might be one of those annoying "write down the letters in this generated gif" humanity tests.

Share
twitter facebook
apache + search + p2p = distributed search engine (Score:2, Insightful)

by datrus ( 265707 ) writes:

Something that would make a nice opensource project would be to include p2p search functionality in apache itself.
This way all the modificed web servers would make a giant distributed search engine.
Some nice algorithms like koorde or kademlia could be used.
Anyone thought about starting something like this?

David
- Re:apache + search + p2p = distributed search engi (Score:2)
  
  by Bert690 ( 540293 ) writes:
  
  Something that would make a nice opensource project would be to include p2p search functionality in apache itself. This way all the modificed web servers would make a giant distributed search engine. Some nice algorithms like koorde or kademlia could be used. Anyone thought about starting something like this?
  We looked into something a lot like what you suggest [ibm.com] (and actually have it up and running inside our intranet with 2k or so users). The problem with doing this on the internet is that p2p technique
Google. (Score:4, Interesting)

by Rick and Roll ( 672077 ) writes: on Monday June 07, 2004 @01:05PM (#9357576)

When I search on Google, half the time I am looking for one of the best sites in a category, like perhaps "OpenGL programming". Other times, however, I am looking for something very specific that may only be referenced about twenty times, if at all.
When I do search in the first category, especially for things such as wallpaper, or simpsons audio clips, the sites that usually turn up are the least coherent ones with dozens of ads. I usually have to dig four or five pages to find a relevant one.
The people with these sites are playing hardball. Google wants them on their side, though, because they often display Google text ads.
Right now, my domain of choice is owned by a squatter that says "here are the results for your search" with a bunch of Google text ads. I was going to/may still put a site there that is very interesting, and the name was a key part of it.
I firmly believe that advertisements are the plague of the Internet. I would like to see sites selling their own products to fund themselves. Google doesn't really help in this regard. The text ads are less annoying than banner ads, but only slightly less annoying.
Don't get me wrong, I like Google. It's an invaluable tool when I'm doing research. I would just like to see them come out in full force against squatters.

Share
twitter facebook
Tomorrow today yesterday (Score:5, Insightful)

by boa13 ( 548222 ) writes: on Monday June 07, 2004 @01:05PM (#9357579) Homepage Journal

But webmasters may soon deluge these handy tools with links back to their site, not to get clicks, but to increase Google page rank.

The Arch Wiki [gnuarch.org] has sufferred several times from such vandals in the past few months. I'm sure other wikis have, too. They create links over single spaces or dots, so that casual readers don't notice them. Attentively watching the RecentChanges page is the most effective way to find and fight them, but this is tiresome. I guess many wikis will require posters to be authenticated soon, which is a blow in the wiki ideal, but not such a major blow. Alternatively, maybe someone will develop heuristics to fight the most common abuses (e.g. external link over a single space).

So, this is not new, but this is now news.

Share
twitter facebook
- Re:Tomorrow today yesterday (Score:2)
  
  by Neophytus ( 642863 ) * writes:
  
  One to look out for is <div style="display:none;"> if html can be posted. It makes the span invisible to any human reader but I doubt that any current search engine can identify the purpose of such a tag.
Not a big deal (Score:5, Informative)

by arvindn ( 542080 ) writes: on Monday June 07, 2004 @01:06PM (#9357589) Homepage Journal

Recently the Chinese wikipedia suffered a spam attack with a distributed network of bots editing articles to add link to some chinese intenet marketing site. In response, the latest version of MediaWiki (the software that runs the wikipedias and sister projects) has a feature to block edits matching a regex (so you can prevent links to a specific domain). Wikis generally have more protection against spamming than weblogs. So I wouldn't worry.

Share
twitter facebook
Hmm (Score:4, Interesting)

by Julian Morrison ( 5575 ) writes: on Monday June 07, 2004 @01:11PM (#9357628)

Leave the links, edit the text to read something like "worthless scumbag, scamming git, googlebomb, please die, low quality, boring" - and lock the page.

Share
twitter facebook
This is a concern for the Google Gorilla? (Score:2, Interesting)

by Mr.Fork ( 633378 ) writes:

Wait a minute - a way to spoof Google to get your page ranked better through WiKi? OMFG! Call the internet police, call Dr. Eric E. Schmidt, call out the Google Gorilla goons! I'm sure the good Dr. has a fix like the ones he used at Novell...

The problem with the whole Google model is that it's biased to begin with. If I'm looking for granny-smith apples, chances are an internet chimp they've bought the space with banana's to Google's goons. It becomes obvious when you see a chimp site that is near the
True (Score:4, Funny)

by Pan T. Hose ( 707794 ) writes: on Monday June 07, 2004 @01:13PM (#9357644) Homepage Journal

"Isn't it time for Google finally to put some work into refining their results to exclude tricks like this?"

I agree. I hope Google will finally put some work into refining their search results. I mean, they are probably the worst search engine ever! Now, Yahoo, MSN, Overture, Altavista... Those are much better. But Google?! Please...

Share
twitter facebook
- Re:True (Score:2)
  
  by Jeff DeMaagd ( 2015 ) writes:
  
  I think it stands to reason that Google shouldn't give ANY opening to competition. If there is a major complaint about how the system works, fix it.
  
  If Google just sits around then the competition will likely catch up.
Google may well downrate this (Score:2)

by Animats ( 122034 ) writes:

I expect that Google will in time give drastically lower weight to easily-modified pages like "blogs" and "wikis". They're not that hard to recognize.
Sandbox persistence (Score:3, Insightful)

by gmuslera ( 3436 ) writes: on Monday June 07, 2004 @01:29PM (#9357789) Homepage Journal

If its a test area, is needed to store it? Wikis could just have it live for the current session or testing of the user, and when the user logs out or finish editing, simply delete/restore it to a default introductory text. Don't need to be some kind of collaborative blackboard or graffiti wall, or at least, if it must be, that be the webmaster choice to be that way (at least TikiWiki [tikiwiki.org] let me disable the sandbox if i want).
But if the problem is to have in websites areas where visitors (even unregistered ones) can post random text and links, even slashdot is potentially target of the same (maybe should be a "Spam" mod score?) or by the way, any site where unregistered visitors can store content in a way or another, be wiki or not.

Share
twitter facebook
"Finally"?? (Score:5, Interesting)

by jdavidb ( 449077 ) writes: on Monday June 07, 2004 @01:30PM (#9357802) Homepage Journal

Isn't it time for Google finally to put some work into refining their results to exclude tricks like this?

I take extreme issue with that statement, and I'm surprised noone else has challenged it. Google does in fact put quite a bit of work into making themselves less vulnerable to these kinds of stunts. They even have a link on every results page where you can tell them if you got results you didn't expect, so they can hunt down the cause and refine their algorithm.

The system will never be perfect, and this is the latest issue that has not (yet) been dealt with. Quit your griping.

Share
twitter facebook
- Re:"Finally"?? (Score:3, Informative)
  
  by jdavidb ( 449077 ) writes:
  
  I checked, and I've got documented evidence of this. On April 25 last year, I reported that earthlink.net was showing up as the top search result [perl.org] for queries involving various religious words, including "Bear Valley Bible Institute." The Church of Scientology (which owns Earthlink) was clearly engaging in something to distort the page rank of earthlink. I had noticed this for a long time before I recorded it.
  On that same day, I reported the problem to Google via their feedback mechanism. I note today
  - Re:"Finally"?? (Score:2)
    
    by jdavidb ( 449077 ) writes:
    
    So at any rate, to sum up, I find the whining about Google "finally" doing something about this to be very unfair, since Google actively works on this kind of problem. It is disingenuous to dismiss their hard work and suggest that they have done nothing.
Why doesn't google (Score:2)

by hackstraw ( 262471 ) * writes:

simply make a distinction between "I am looking to buy something" searches vs "I am looking for information about something".

They are cleary different kinds of searches, and I do both of them, yet I get the same results for both kinds of searches. With the exception for froogle, which is definitely a step in the right direction, but not quite there.

Although the interface has gotten a little better on altavista (remember them??), but searches like: for used condoms [altavista.com] do not make sense for retail stores at a
Easy solution (Score:3, Insightful)

by lightspawn ( 155347 ) writes: on Monday June 07, 2004 @01:34PM (#9357833) Homepage

Edit robots.txt to let search engines know they should ignore sandbox pages.

Share
twitter facebook
That's very interesting KEYWORD (Score:2, Funny)

by Doesn't_Comment_Code ( 692510 ) writes:

That's a very interesting article.

Sig
--
KEY PHRASE <A HREF=www.my_website.com> KEYWORD KEYWORD KEYWORD <\A>
image based spam control (Score:4, Interesting)

by MaximusTheGreat ( 248770 ) writes: on Monday June 07, 2004 @01:38PM (#9357870) Homepage

What about using random image based spam control lik the one yahoo uses on its new mail signup?
So, every time you edit/post comment, you would be presented with an image with a random distorted text, which you will have to type in to be able to edit/post. That should take care of automated systems.

Share
twitter facebook
- Re:image based spam control (Score:3, Insightful)
  
  by JamieF ( 16832 ) writes:
  
  Hear, hear. Systems (software or otherwise) that offer something of monetary value for free, and provide no mechanism whatsoever to prevent people from exploiting them, are going to get exploited. Shocking!
  
  Maybe it wasn't obvious to blog and wiki programmers that the ability to post a comment or edit a wiki page was worth money. It isn't worth a lot per post, but because these are online systems, they are very susceptible to bots that can post in huge volume. All of those posts together can alter a site's
- Re:image based spam control (Score:3, Insightful)
  
  by Blakey Rat ( 99501 ) writes:
  
  I've always wondered why the image is always distorted images which are hard to read on speckled backgrounds?
  
  Why not just show the picture of an object, like an apple or something, and ask the user to type in what it is? I mean, you could have a few hundred of these and it would be nearly impossible for an automated system to guess. (You have a few hundred different items, and like 5-10 images of each item.) I dunno, seems easier to me, but I don't write web software.
It's already been invented. (Score:4, Informative)

by herrvinny ( 698679 ) writes: on Monday June 07, 2004 @02:12PM (#9358208)

The Robots Exclusion Protocol (i.e. robots.txt [robotstxt.org].
Here's Google's stance on the subject (boils down to you don't want it indexed, put in a damn robots.txt file) [google.com]
Hell, even Google News uses robots.txt [google.com]

Share
twitter facebook
Clean sandbox daily. (Score:3, Informative)

by chiph ( 523845 ) writes: on Monday June 07, 2004 @02:24PM (#9358333)

As any cat owner will tell you, you need to clean the sandbox out periodically. In the case of a Wiki, overnight would probably be a good idea.

Chip H.

Share
twitter facebook
Another solution besides robots.txt (Score:3, Interesting)

by wamatt ( 782485 ) writes: on Monday June 07, 2004 @02:30PM (#9358376)

Spammers are going there because you have a high PR. So cut the PR supply and you in business, http://www.site.com/~url=http://www.link.com and voila - URL rewriting. no more PR for mr spammer.

Share
twitter facebook
- Re:E2 (Score:2)
  
  by proj_2501 ( 78149 ) writes:
  
  E2 doesn't have external links except those posted by gods, and they also have a vicious team of editors just waiting to pounce on things like this.
- Re:Naughty behaviour (Score:3, Informative)
  
  by Doesn't_Comment_Code ( 692510 ) writes:
  
  I'm looking for a clean, fast, non-buggy alternative to the google giant. Preferably open source.
  
  Any suggestions?
  
  The only big one I know of right now is Nutch. It is an open source search engine that is in the later stages of development, but hasn't produced a large, usable site yet.
  
  nutch.org [nutch.org]
  
  Since it will be open source, you will be able to read the ranking algorithms and change/abuse them as you see fit.
  
  This one http://search.mnogo.ru/ [mnogo.ru] is also available.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Why just wikis? (Score:5, Insightful)

Re:Why just wikis? (Score:5, Funny)

Re:Why just wikis? (Score:5, Funny)

Grow up (Score:5, Funny)

Re:Grow up (Score:5, Insightful)

Re:Why just wikis? (Score:5, Informative)

Re:Why just wikis? (Score:3, Funny)

Re:Why just wikis? (Score:3, Interesting)

Re:Why just wikis? (Score:5, Informative)

Re:Why just wikis? (Score:4, Informative)

Re:Why just wikis? (Score:5, Interesting)

Re:Why just wikis? (Score:3, Insightful)

visual security code for sign-up (Score:5, Informative)

Re:visual security code for sign-up (Score:5, Insightful)

Re:visual security code for sign-up (Score:3, Informative)

Which is why I thought it was real time (Score:4, Interesting)

Re:Which is why I thought it was real time (Score:4, Informative)

Re: (Score:3, Informative)

Re:Why just wikis? (Score:2, Informative)

Re:Why just wikis? (Score:5, Funny)

Re:Why just wikis? (Score:5, Funny)

Re:Why just wikis? (Score:2)

Cyberneighborhood Not-Watch? (Score:5, Interesting)

Re:Cyberneighborhood Not-Watch? (Score:3, Insightful)

Re:Cyberneighborhood Not-Watch? (Score:2, Informative)

Re:Cyberneighborhood Not-Watch? (Score:5, Interesting)

Re:Cyberneighborhood Not-Watch? (Score:5, Informative)

Re:Cyberneighborhood Not-Watch? (Score:3, Insightful)

Re:Cyberneighborhood Not-Watch? (Score:3, Insightful)

Re:Cyberneighborhood Not-Watch? (Score:2)

just like spam (Score:3, Insightful)

Oh well (Score:5, Informative)

Yes... PLEASE... (Score:5, Insightful)

Re:Yes... PLEASE... (Score:2)

Re:Yes... PLEASE... (Score:2)

Sure, that will work (Score:2)

Re:Yes... PLEASE... (Score:2)

Re:Yes... PLEASE... (Score:3, Interesting)

Re:Yes... PLEASE... (Score:2)

Re:Yes... PLEASE... (Score:3, Interesting)

Re:Yes... PLEASE... (Score:5, Interesting)

naked women are trash? i'll take all you got (Score:4, Funny)

Re:Yes... PLEASE... (Score:3, Funny)

Re:Yes... PLEASE... (Score:3, Insightful)

Re:Yes... PLEASE... (Score:2)

Re:Yes... PLEASE... (Score:2)

like porn (Score:5, Interesting)

You know... (Score:3, Insightful)

Re:You know... (Score:4, Insightful)

Re:You know... (Score:4, Insightful)

Re:You know... (Score:2)

Re:You know... (Score:3, Insightful)

It just might work! (Score:5, Funny)

< jab jab > (Score:2, Interesting)

Some people ... (Score:2, Insightful)

google works (Score:4, Informative)

Who's fault is that? (Score:5, Insightful)

Re:Who's fault is that? (Score:2)

Re:Who's fault is that? (Score:2)

Re:Who's fault is that? (Score:2)

Re:Who's fault is that? (Score:3, Interesting)

ROBOTS.TXT (Score:5, Insightful)

Re:ROBOTS.TXT (Score:2)

Re:ROBOTS.TXT (Score:2)

Same site, a few days later: Don't do it. (Score:2, Insightful)

Complacency (Score:5, Interesting)

Re:Complacency (Score:2)

Re:Complacency (Score:2)

Well, it's about time this gets some attention (Score:5, Insightful)

Re:Well, it's about time this gets some attention (Score:5, Insightful)

Re:Well, it's about time this gets some attention (Score:2)

Re:Well, it's about time this gets some attention (Score:2)

This happened to me (Score:5, Interesting)

I've seen this (Score:4, Informative)

apache + search + p2p = distributed search engine (Score:2, Insightful)

Re:apache + search + p2p = distributed search engi (Score:2)

Google. (Score:4, Interesting)

Tomorrow today yesterday (Score:5, Insightful)

Re:Tomorrow today yesterday (Score:2)

Not a big deal (Score:5, Informative)