Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×

Publishers Seek Change in Search Result Content 181

explosivejared writes "The Washington Post is running a story on the fight between publishers and search engines over just what exactly is allowed to be shown by search results. From the article: 'The desire for greater control over how search engines index and display Web sites is driving an effort launched yesterday by leading news organizations and other publishers to revise a 13-year-old technology for restricting access. Currently, Google, Yahoo and other top search companies voluntarily respect a Web site's wishes as declared in a text file known as robots.txt, which a search engine's indexing software, called a crawler, knows to look for on a site ... [new] proposed extensions, known as Automated Content Access Protocol, partly grew out of those disputes. Leading the ACAP effort were groups representing publishers of newspapers, magazines, online databases, books and journals. The AP is one of dozens of organizations that have joined ACAP."
This discussion has been archived. No new comments can be posted.

Publishers Seek Change in Search Result Content

Comments Filter:
  • by explosivejared ( 1186049 ) <hagan@jared.gmail@com> on Sunday December 02, 2007 @04:37PM (#21554033)
    When I submitted this I added that a lot of times the more I see in a search result, the more likely I am to hit that website. I know going in that the search engine is going to have the full story. It's a summary. That being said, I submitted this to point out the misstep I think publishers are taking. Search engines and aggregators drive their business, and usually they do it for free. I don't understand why anyone would think it would be a good idea to mess with that. Hopefully someone can explain this to me, as the stuff in the article led me to believe the publishers are making a big mistake.
    • Re: (Score:3, Interesting)

      by Anonymous Coward
      It may be the case that they have ads on the page for which they get paid by the page view, and by allowing search engines to show a summary, you may be saved from going to the page, depriving them of revenue.

      However, I tend to agree with you, and when I don't see a relevant summary, I'm simply less likely to click through to the page, so this may well backfire on them. Either they're not understanding search users' usage patterns, or else they believe that this is so prevalent that nothing will have summa
      • Re: (Score:3, Funny)

        by Tanktalus ( 794810 )

        As a slashdot user, I *only* look at the summaries. I don't click to read the actual article, but learn everything I need to know about a subject simply by the summary available on google.

        It works fine here, so why not on google?

    • Re: (Score:3, Insightful)

      by iminplaya ( 723125 )
      Not only that, but if the big search engines start restricting search results, we might see many more "home grown" search engines fill the net with spiders that won't respect robots.txt, and start clogging the tubes which are already getting clogged with advertisement. As it is, I don't care if the publishers rot.
      • Re: (Score:3, Interesting)

        by ShieldW0lf ( 601553 )
        As it is, I don't care if the publishers rot.

        I do. Every time I hear about something like this, the site goes on my CustomizeGoogle blacklist, never to be seen again. It was the slashdot policy of posting "registration required" links to the New York Times that got me started on this path, and honestly, I'm better informed for it. All these big "news" publishers deliver is sanitized, oversimplified, dumbed down, biased and superficial stories blended with propaganda and outright lies concocted by priva
        • Re: (Score:3, Insightful)

          Wow. Don't you think you're overreacting, just a little?

          Your sig is particularly ironic here. If you want information to be free, you're welcome to offer to pay the salaries of all the journalists, reporters, cameramen, sound crews, and support folks who are out there all over the world collecting it. Go ahead and put your money where your mouth is.

          • by ShieldW0lf ( 601553 ) on Sunday December 02, 2007 @07:34PM (#21555199) Journal
            Wow. Don't you think you're overreacting, just a little?

            Your sig is particularly ironic here. If you want information to be free, you're welcome to offer to pay the salaries of all the journalists, reporters, cameramen, sound crews, and support folks who are out there all over the world collecting it. Go ahead and put your money where your mouth is.


            I am.

            I'll be launching a service in the new year to help actively creating artists make a profit off selling original works, leveraging the copyleft and mashup cultures to generate a fanbase and simultaneously devalue the global copyright pool.

            For the right types of creators, the strategy of increasing the amount of budgets available for custom work by annihilating the cost of existing bodies of work is a valid one, and I intend to make it very easy for those types of people to do so as a side effect of their making money off the things that you cannot copy.

            You'll excuse me if I wait till the new year to slashdot myself, but I assure you, I have sunk hundreds and hundreds of man hours and a lot of my own dough into putting my money where my mouth is, and when I'm ready, you will know all about it whether you like it or not, because it will be some noteworthy stuff.

            So no. I don't think I'm overreacting at all. I like to think when it all pans out in the end I'm going to play some small but important personal role in bringing the old things crashing down as a matter of fact. And have the people doing the real work be richer for it.

            • Well, for what it's worth, I admire your dedication and willingness to give it a shot. I can't help but suspect it will only be a drop in the ocean, but best of luck to you. If you make it and in five years we're all having a very different discussion, I will be more than happy to concede that I was wrong today.

            • by TheLink ( 130905 )
              Sounds cool.

              Is there a way to pay/tip OSS coders directly? I suppose that might be such a great thing as it becomes a popularity contest - and some code though vital might not attract as much attention from the masses.
      • What you're suggesting is that you don't care about the livelihood of the people who supply you with the information you read every day on the web. Sure, they could stop publishing tomorrow, but then we'd all have to go back to hobbies that don't involve reading on the computer.
        • Re: (Score:2, Insightful)

          by iminplaya ( 723125 )
          You got it wrong. I don't care about the livelihood of the people who try to restrict and monopolize the information I read everyday on the web. Big difference there.
    • Re: (Score:3, Interesting)

      Tom Curley, the AP's chief executive, said the news cooperative spends hundreds of millions of dollars annually covering the world, and that its employees risk often their lives doing so. Technologies such as ACAP, he said, are important to protect AP's original news reports from sites that distribute them without permission.

      Currently, Google, Yahoo and other top search companies voluntarily respect a Web site's wishes as declared in a text file known as robots.txt, which a search engine's indexing softwa

      • Re: (Score:2, Insightful)

        That's the problem.

        They want you on their site, but they want the power to summarise and manage their search engine face to maximise foot traffic whilst not giving the whole story away.
    • by fm6 ( 162816 ) on Sunday December 02, 2007 @05:41PM (#21554541) Homepage Journal
      Even without your comments, your submission is way too long. You quoted nearly one third of the article! Next time, take the time to summarize the article in a few sentences. Not only will that make room for your opinions, it will make for a more readable submission that's more likely to he accepted.
    • by pla ( 258480 ) on Sunday December 02, 2007 @06:36PM (#21554871) Journal
      Hopefully someone can explain this to me, as the stuff in the article led me to believe the publishers are making a big mistake.

      Simple - They want to have their cake and eat it too.

      They already have the absolute power to block Google. Further than that, Google (and every major search engine out there) honors the robots file, so they don't even need to go so far as actually "blocking" Google, they can politely tell it to go away.

      However, doing that amounts to committing web-suicide for any online content producer, and the publishers know it. So they can't really do that. Thus, they bitch and whine about the unfairness of all the traffic (and corresponding ad revenue) Google brings them, for the sake of the very small number of "lost" hits resulting from people getting a sufficient answer directly from the search results page.

      Can you hear the violins?
      • Ok, regardless of the legal fairness, I'd think removing those previews would actually reduce the likelihood of me visiting such a site.

        Almost never do I see a Google result and say "Ok, I know all I needed to, not going to click." More often, I see one and say "Gee, looks like that site won't be very helpful, let's move on to the next one." I can only imagine my response would be like that, only more so, towards anyone who could allow Google to index them without allowing Google to summarize them.
    • A prediction (Score:5, Insightful)

      by Anonymous Brave Guy ( 457657 ) on Sunday December 02, 2007 @06:58PM (#21554995)

      That being said, I submitted this to point out the misstep I think publishers are taking. Search engines and aggregators drive their business, and usually they do it for free. I don't understand why anyone would think it would be a good idea to mess with that.

      This being Slashdot, I predict that huge numbers of people will now arrive in this thread and say that you're absolutely right, the search engines are providing a great service, and the publishers should just suck it up because they'd die without them.

      The thing is, they're completely wrong. It's actually the other way around, for the simple reason that news aggregators produce no useful content of their own.

      For you or me, as someone who wants to know what's happening today, we can do one of two obvious things using a web browser. We can visit a specific news site we already know about (or at least guess at a URL), or we can start with an aggregator like Google News. Either way, many people will only read the headlines and summary for most stories. Either way, someone had to go out and get the information to write that story. But in one case, the people who brought the knowledge to the public get the page hit, while in the other, the search engine gets the hit in exchange for ripping much of the value of the other sites' content and the people who actually provided the content get nada.

      It's common at this point for someone to pipe up with a fair use argument, but again, they are wrong, for the simple reason that while the headlines and summaries on news aggregators may only be small excerpts from the entire article, they represent a very significant chunk of the value. You can easily determine this by observing the proportion of users who look something up on an aggregator and never follow through to read any article in more detail; I don't know exactly what the answer is, but I'll wager it's a substantial proportion, perhaps even the majority.

      Another common argument is that the news sites would die without input from search engines, but again I can't believe this is really true. When I reach lunchtime at work, I do not visit Google to find the BBC News web site, I just type in news.bbc.co.uk. (Actually, I visit the bookmark, but the first time that's what I typed.) Google, or any other news aggregator, is wholly unnecessary to my finding the main news site. Even without that, I could easily have guessed that the BBC News web site could be reached at www.bbc.co.uk/news or news.bbc.co.uk, either of which would have got me there immediately. The site is advertised via the BBC's other media as well. A significant proportion of the links I e-mail to and receive from friends and family are direct links to stories on the site.

      Basically, if every search engine on the planet disappeared tomorrow, I rather doubt the big news services would care. As with everything else to do with search engines, they are just a middleman service, and one that is entirely expendable. If they weren't around, the Internet community would just develop an alternative or five, probably rather quickly, just as it always does.

      On the other hand, if the big news services stopped providing news tomorrow, aggregator services like Google News would be completely dead, because they provide absolutely no value in themselves. They simply scrounge content from one source and visitors from another, and insert themselves as a middle man to cream off some of the profits.

      The very fact that one service could survive quite happily without the other, while the other would die immediately without the first, tells us everything we need to know about the merits and public service benefits of each. That being the case, I find it hard to argue with the publishers' position that the news aggregators are basically ripping them off, and I don't really have much sympathy with the two most common counter-arguments people seem to be making in this Slashdot discussion.

      • Re: (Score:3, Interesting)

        If the news and book sites wanted to keep the search engines out, they would just set up their robots.txt files to block all access. Then they would never show up on Google. The don't want to do that because they know it would be death to them. Google doesn't supply any content, but it does supply a service: It's the first place people go to find out information. If they need more than a summary, they can click on links from the summary page to get details. People aren't going to go to ten websites to
        • Re: (Score:3, Insightful)

          If the news and book sites wanted to keep the search engines out, they would just set up their robots.txt files to block all access. Then they would never show up on Google. The don't want to do that because they know it would be death to them.

          I'm not at all convinced that's really true. To borrow a related copyright-area theme, it's like the RIAA saying that they have to use DRM, because otherwise no-one will buy legal copies of their stuff. It's just an assumption, which they aren't yet willing to risk violating in case it goes wrong. That doesn't necessarily mean that if they had no choice but to work on a different basis, they'd lose out.

          But you contradict yourself as well. You say that if the search engines disappeared, the internet would just create more, but then you say that if the big news services stopped providing news, the search engines would die. No they wouldn't. The internet would create more, filling the need.

          Actually, I said the community would create alternatives. I have been rather sceptical about the real

          • Re: (Score:3, Insightful)

            by Gallowglass ( 22346 )
            You wrote: "I'm also not convinced they're particularly useful anyway these days..."

            And yet, every time I Google, I find what I'm looking for. To my mind, that's useful.
      • I have a pile of sand and some water. I didn't make them. Maybe God did, or the sun, but it was not me.

        Now I build a sand castle by gathering sand and water together.

        This is what news aggregations do for me. Aggregators are in the service of providing "emergent content."

        Aggregator mass information relevant to some topic so that the big picture can be seen. This picture emerges after the aggregation of information(the sand) is structured so one can see the whole picture(the sand castle).

      • Re: (Score:2, Interesting)

        by zantolak ( 701554 )
        The organized synthesis and presentation of this content is, in itself, useful content. The number of people using news aggregators should have clued you in on this.
        • I wouldn't know. I'm not sure I've ever met someone who actually uses these services. Pretty much everyone I see using the web just goes straight to their favourite source(s).

  • by Anonymous Coward on Sunday December 02, 2007 @04:41PM (#21554065)
    Hmm, i wonder how long before someone opens a search engine that indexes only what is "hidden"(yeah, really...) by the ACAP settings.

    Just don't do it in the US or someone will tell the judge: "The defendant knowingly circumvented the DRM - which is called ACAP - of our online newspaper".

    ACAP - Anonymous Coward Anonymously Posting
    • Specifically, this seems geared towards sites like Google News that aggregate stories and then publish snippets of them on their home page.

      Personally, I don't really see the problem. You either want your site spidered or you don't. You don't get to control the presentation of the data that is spidered, only the search engines get to do that.

      SO the thing is here is that Google takes its ordinary web spider, applies a little magic to it, and then displays the results as a news page. Big deal.

      You either want
    • I think it would have been better named Content Retrieval Access Protocol.
  • by doas777 ( 1138627 ) on Sunday December 02, 2007 @04:42PM (#21554073)
    from TFA ""The free riding deprives AP of economic returns on its investments," he said."

    same old rule applies; never trust anyone who uses business terms like ROI, for he cares not for you or society, but only for what he can remove from your wallet, without getting arrested over it.
    • Re: (Score:2, Insightful)

      by Seumas ( 6865 )
      I don't see how difficult this is. If you do not want something on the internet, don't put it on the internet. It's not like Google is going out there and signing into a paying account and indexing paid-for content. In fact, how many times have you found something on google, clicked it -- only to find that all you can read is one paragraph before the site (NYT, etc) throws you a sales pitch to pay $5 or $20 if you want to read the rest?

      My opinion? Good riddance to the lot of them. Please take all the "yahoo
    • We should revert the argument by requiring news reporting agencies to pay for facts. After all, one might argue, what they do is just sending people around to look and tell what they've seen. They don't "produce" anything! Is it fair that all the people who produced facts to be seen and told not be paid for them? And how about those photos of buildings? And of people walking on the street? All of them should be paid too! It's time we stop news reporting agencies from leeching the hard work of fact producers
  • My reaction... (Score:5, Insightful)

    by Z80xxc! ( 1111479 ) on Sunday December 02, 2007 @04:44PM (#21554087)
    Personally, I think that it's useful for Google and other search engines to show what's truly relevant when you're searching for a page. The fact is, I'm more likely to click on a search result if I can see some of the actual content, and more specifically, the actual text or images that I was looking for. If they don't show me what I want to see, I won't see the rest of it. If it only shows some text that they decide I should see, then it makes it much harder to determine what I'm actually looking at. Even as it now, when results come up that are ambiguous, I find myself less likely to click on them. I readily admit that robots.txt is getting old and isn't really enough any more, but I'm not sure if what they're proposing is the right answer. Additionally, if Google were to implement a new method of searching using ACAP, then what would happen to the sites using the old methods? Would they not be indexed? What if I want all my material to be shown and I don't feel like going through and choosing every little detail about what to allow and not to allow? It's an idea worth looking at, but it's not anywhere a finished, usable idea.
  • Terms & Condition (Score:5, Insightful)

    by SaidinUnleashed ( 797936 ) on Sunday December 02, 2007 @04:45PM (#21554093)
    I really wish that the AP and other similar entities would realize that no matter the legal backing of their terms and conditions of redistribution very few people actually care, and people care less every day. At Burger King, they provide a copy of the newspaper. Does the AP get money for every reader? I think not. This is just are ridiculous as it would be if they tried to make Burger King pay for every person who reads the newspaper while in the restaraunt.
    • actually yes they do. Burger buys a dozen newspapers and leaves them out for their customers. at $.50 each it is a cheap way to get people to come in the door in the morning.

      but your analogy isn't correct. It's more like a library charging people to look through the catalog to see if the books they want are present.
      • For Burger King, yes.

        But how many people read these dozen newspapers? I would guess a lot more than twelve people.
    • by shawb ( 16347 )
      Yes, there may be a gratis copy of the newspaper at Burger King. And the AP DOES receive financial benefit from that newspaper sitting there... because Burger King leaves the ads in. If Burger King were to put copies of all the newspaper stories out with advertisements stripped out, then they would probably get a cease and desist fairly quickly. That could be construed as similar to a search engine displaying the bulk of what is displayed in a news story. I don't think the AP is so much looking to stop
  • by Wesley Felter ( 138342 ) <wesley@felter.org> on Sunday December 02, 2007 @04:46PM (#21554107) Homepage
    http://www.the-acap.org/project-documents.php [the-acap.org]

    At first glance it appears to be a set of extensions to robots.txt that allow newspapers to specify things like:
    This article will disappear from our site in N days, so it better disappear from search engines at the same time
    Don't frame this article
    Don't extract images or thumbnails from this article
    If you show a cached copy of this article, it better include the original ads
    etc.
    • I think there is a place for some of these extensions to robots.txt. However, the first reference I ran across for this group was just after one of the major news organizations got major egg on its face for a news article that had blatantly false and biased information in it. When someone publicized this glaring bias/falsehood, the original news organization quickly changed their web site and denied that the article ever said that. The problem was that the original web article was already cached by Google
  • Seriously (Score:5, Insightful)

    by Anonymous Coward on Sunday December 02, 2007 @04:47PM (#21554113)
    If you don't want anything to be indexed or archived, it needs to be behind a secure connection or NOT POSTED AT ALL.
  • Here's a tip... (Score:5, Insightful)

    by Digital Vomit ( 891734 ) on Sunday December 02, 2007 @04:48PM (#21554123) Homepage Journal

    Here's a tip:

    If you don't want something to become public knowledge -- accessible by anyone -- then don't put it on the internet.

    • Re: (Score:3, Interesting)

      by hedwards ( 940851 )
      That was largely my thought. It makes very little sense as to why anybody would blind click on a link in this day and age. I personally depend upon the summaries to decide whether or not to click. If I don't get a summary I don't click.

      It would make far more sense for these institutions to just take their sites completely off of the search engines via robots.txt and save up those slots in the search results for sites that want traffic. Or perhaps limit it to just the front page, but I think that one can sti
    • by garcia ( 6573 )
      If you don't want something to become public knowledge -- accessible by anyone -- then don't put it on the internet.

      Or use the sitemaps protocol to let them spider what semi-private information you want to offer and let users of your site decide whether or not it's worth their time to login (or whatever authentication method you choose) to read what they deem acceptable.

      If you put shit up on the web for everyone to read, that will include spiders, and then stop whining when public information is read by, *g
  • My one lasting legacy on the web ...

    Back in 1993, when I was teaching myself Perl in my spare time (while working for a -- cough -- UNIX company called The Santa Cruz Operation -- no relation to the current Utah asshats of that name), I was practicing by working on a spider. Now, back then SCO's Watford engineering centre was connected to the internet by a humongous 64kbps leased line. And I was working with a variety of sources on robots, and it just so happened that because I was doing a deterministic depth-first traversal of the web (hey, back then you could subscribe to the NCSA "what's new on the web" bulletin and visit all the interesting new websites every day before your coffee cooled), I kept hitting on Martin Kjoster's website. And Martin's then employers (who were doing something esoteric and X.509 oriented, IIRC) only had a 14.4kbps leased line. (Yes, you read that right: a couple of years later we all had faster modems, but this was the stone age.)

    Eventually Martin figured out that I was the bozo who kept leeching all his bandwidth, and contacted me. Throttling and QoS stuff was all in the future back then, so he went for a simpler solution: "Look for a text file called /robots.txt. It has a list of stuff you are not to pull in. Obey it, or I yell at your sysadmins." And so, I guess, my first attempt at a spider was also the first spider to obey the embryonic robot exclusion protocol. Which Martin subsequently generalized and which got turned into a standard.

    So if you're wondering why robots.txt is rather simplistic and brain-dead, it's because it was written to keep this rather simplistic and brain-dead perl n00b from pillaging Martin's bandwidth.

    Ah, the good old days when you could accidentally make someone invent a new protocol before breakfast ...
  • by Bill Dimm ( 463823 ) on Sunday December 02, 2007 @04:48PM (#21554131) Homepage
    You would think an article about ACAP would provide a link to it [the-acap.org].
    • by Jah-Wren Ryel ( 80510 ) on Sunday December 02, 2007 @04:56PM (#21554191)

      You would think an article about ACAP would provide a link to it.
      Sorry, their new exclusionary rules prevent any linking to their content.
    • I found out about this story yesterday on (ta-da!) Google News. A little more searching (again via Google) led me to their web site. Interestingly, I could not find any information there about who constitutes the ASCAP membership. The ACAP site lacks a search tool ... (surprise, surprise) so back to ... Google for more searching which eventually leads to this page [wikipedia.org]. No doubt Yahoo or MSN search would have led to the same findings. The Wikipedia article has a short list of the main suspects doubtless ther
  • by Entropius ( 188861 ) on Sunday December 02, 2007 @04:51PM (#21554149)
    As I understand it the main purpose of robots.txt is to prevent crawlers from consuming excessive amounts of network resources, not to "protect content". It's not a contract; it's not legally-binding; it's a request that automated web agents choose to follow if they want to be polite, or rather a description of how to be polite in the context of a certain site. (Nobody wants crawlers to be indexing dynamically-generated pages, for instance.) As an example, the physics preprint archive arXiv.org has a rather sternly-worded warning: "Follow our robots.txt file or you'll wander off into terabytes of dynamically-generated files, chewing up lots of our bandwidth, and we'll have to ban you to protect our bandwidth bill." That's what it's for, not "protecting content".

    Banning Google from visiting a page and then summarizing its result on a search page is pretty much equivalent to Slashdot banning me from saying "There's this article at goatpron.slashdot.org/whatever that has a description of goat bestiality that I think you might find interesting".

    As long as the summaries are sufficiently short so that they fall under the fair use exception (which Google search results surely do), Google can keep on doing what they're doing.
    • by Jeffrey Baker ( 6191 ) on Sunday December 02, 2007 @04:57PM (#21554199)
      You might find it odd, but there's a lot of lawyers out there (almost all of them, in my experience) who seriously claim that the Terms of Service linked at the bottom of every commercial website. They say it's binding even if you've never read it, and even if it changes and you haven't read the changes. It's binding even if it's not linked from anywhere obvious.

      Now, I realize that these people are idiots, and that probably their future involves a wall, their backs, and a revolution, but at present their counsel is widely respected among the holders of wealth and power. So when you say that robots.txt is "not a contract" you should talk to a lawyer about that. You'd be amazed at the things they say.
      • by piojo ( 995934 ) on Sunday December 02, 2007 @06:01PM (#21554661)

        You might find it odd, but there's a lot of lawyers out there (almost all of them, in my experience) who seriously claim that the Terms of Service linked at the bottom of every commercial website. They say it's binding even if you've never read it, and even if it changes and you haven't read the changes. It's binding even if it's not linked from anywhere obvious.
        That's true, but I'm interested in whether a computer program has to obey contracts. If I write a program and it breaks contracts, am I immediately responsible, or must someone tell me that the program is breaking contracts. If the program is viewed as a tool or an extension of myself, it's probably the former. But programs are frequently not extensions of myself. For instance, if I downloaded the program, not wrote it, there would be no way I could know it was violating contracts.
        • That's why the whole idea is absurd. Some software acting on my behalf sends a request to your software. Your software can choose to answer the request and how, or to ignore it. As a citizen who has not had the opportunity to undergo the brain-erasure procedure they use at law schools, I fail to see where the contract attaches.
      • by sjames ( 1099 )

        I'm sure they will have all sorts of things to say, particularly if they can get you to pay them to say it. None of that changes the fact that if you print something and put it on public display, don't be surprised if people look at it. If it's at all interesting, some of them may talk about it. If that's not desired, then take it down.

        There are sites that do go way too far trying to make the content look like their own, but that isn't going to be fixed by a more complicated robots.txt

    • by Bodrius ( 191265 )
      I do not think your comparison is valid.

      This is more like banning you from copy/paste and re-posting 1/2 or the full article to karma-whore for a +5 Informative, "before the site is down".

      Google does not have an AI that passes the Turing-test yet. They don't summarize, paraphrase, or otherwise reinterpret content.

      They just extract and render pieces as-is - it's a direct quote.
      It falls into fair use only as long as Google doesn't karma-whore too obviously.

      The discussion of 'rights' is silly anyway - they hav
    • by Aladrin ( 926209 )
      I suspect it -is- legally binding since Google has said that they will honor the robots.txt. If they suddenly stop, they could find themselves in for a lawsuit. (Whether that lawsuit has merit would be determined in a court of law.)

      ANAL.
      • Google needs the content producers. If Google ever pisses off the bulk of the mainstream sites to the extent that nobody will allow them to index, they'll find themselves in a bad place. They need quality content to index. Hitting hundreds of blogs and pseudo news sites won't lead to a mad rush to place ads on Google's search results pages.
    • by Dr. Tom ( 23206 )
      robots.txt does more than limit bandwidth, it actually *helps* the crawlers by letting them know which parts of a site it may be a bad idea to crawl. Once, some "internet archive" bozo making a "full snapshot" of the web got into our scheduling calendar, which of course is an infinite virtual space, and had downloaded pages going up to about 2030 before I noticed it and shut him down. They said on their website that they were specifically ignoring robots.txt so their snapshot would be more complete. If I ha
  • by xigxag ( 167441 ) on Sunday December 02, 2007 @05:00PM (#21554219)
    I understand completely. I too would like to stop my nosy neighbors from peering at me out of their window when I leave my house in the morning. My plan is to implement "pay per stare" at some point in the future but they aren't gonna pay if they can get their jollies for free. I blame the "Sun" and "street lamps" and "glass" and other devices that interfere with my ability to effect sole distribution over the intellectual property that is my personal image. Well, at the very least, I should be able to sue torch/flashlight manufacturers into oblivion and then use my deserved winnings to tackle the big boys 150 gigameters away.
    • I blame the "Sun" and...
      Yup, I blame Sun for a lot of my problems, too - mainly Java.
    • I too would like to stop my nosy neighbors from peering at me out of their window

      I wish I had thought of this. We used to have these crazy neighbors across the street who would come out on their porch whenever my wife and I would go out our front door. They would just watch. Fucking neighbor TV. When I was bored, I'd walk in and out 10-15 times a day. At least we were both getting our exercise.
      • by Dunbal ( 464142 )
        Today if you act suspicious, your neighbor will call the fbi and report you as a possible terrorist.
  • by m94mni ( 541438 ) on Sunday December 02, 2007 @05:07PM (#21554263)
    Note that robots.txt, favicon.ico and /w3c/p3p have been raised as issues for the W3C Technical Architecture Group:

    http://www.w3.org/2001/tag/group/track/issues/36 [w3.org]

    See Tim B-L's original mail here:

    http://lists.w3.org/Archives/Public/www-tag/2003Feb/0093 [w3.org]

    One can only hope that any new efforts keep this issue in mind (hint: stop polluting *everyone's* namespace!).
  • These days I find most things on the web by searching, not by following links. If these people want to cut themselves off from the world by refusing to allow search engines to catalog them, why not? People whose work is inaccessible to most because their publishers refuse to let it be on search engines will soon decide that they no longer need a publisher.
  • by hal9000(jr) ( 316943 ) on Sunday December 02, 2007 @05:19PM (#21554359)
    I know my position is very un-slashdotish, but there is nothing wrong with content producers wanting to control how their content, in particular, the stuff they paid to generate, from being indexed. It's not that they don't want you to see the content, it's that they want to control how you see that content. They want it wrapped in their page, with ads, and not summarized on a search page. Egads, what if you read the summary and decided not to visit the site after all?

    Fine. But as we all know, we probably have a few sites that we book mark and visit often. We probably get alot of news from RSS. But alot of people are directed to sites via search engines. So if a content producer, say a news paper, doesn't want it's content indexed, then fine. It will only result in a LOSS of traffic to their site.

    Look, content producers have to make money. They have people to pay, stuff to print, etc. They have expenses. It is truly sad that rather than trying to figure out how to make content relevant and useful, some content producers simply want to continue analog methods in a digital world.

    Gee, just a thought, but what about a way to display a summary and an ad chosen by the content producer along with the summary? Advertisers would spend lots for that kind of exposure.
    • Re: (Score:3, Insightful)

      by grcumb ( 781340 )

      I know my position is very un-slashdotish, but there is nothing wrong with content producers wanting to control how their content, in particular, the stuff they paid to generate, from being indexed.

      I'm going to assume that you actually mean "...is being indexed."

      It's not that they don't want you to see the content, it's that they want to control how you see that content. They want it wrapped in their page, with ads, and not summarized on a search page. Egads, what if you read the summary and decided not

      • by ScrewMaster ( 602015 ) on Sunday December 02, 2007 @06:30PM (#21554833)
        The best we can do is use moral suasion [google.vu] to request that people respect our wishes with regards to particular content.

        Ha, yeah. I recently purchased a DVD where the opening scene showed a kid snatching a woman's purse and running off, with the voice of doom saying "You wouldn't steal purse would you?" in a ridiculous, nay, pathetic attempt at "moral suasion". I was then subjected to several more unskippable minutes of this asinine lecturing, various legal threats, plus a couple of movie previews and advertisements that I couldn't skip past either. What the hell? So by the time I reached the main feature, I was so irritated (seeing that I'd just paid sixteen bucks for the damn disc) that I pulled the disc from the player and fired up DVD-Shrink. Half an hour later I had a re-authored copy without all the crap, and that's what I watched.

        Idiots.
        • reminds me of a boondocks ep (the one where they sneak food into a theater) - instead of the normal FBI lecture, it showed some punk beating the everloving shit out of a grandma and taking her purse, then said "stealing a movie is exactly like beating up an old woman", while riley tapes the whole thing.
      • It's an unfortunate fact of life that these people need to have a smart, communicative geek (like, say Larry Lessig) sit down with them and explain that a fundamental aspect of digital information is that it can be replicated with virtually no effort and next to no cost.


        And yet millions or billions of people worldwide seem to want the man to hunt down fraudsters who pretend to be one of the aforementioned billions.
    • I know my position is very un-slashdotish, but there is nothing wrong with content producers wanting to control how their content, in particular, the stuff they paid to generate, from being indexed. It's not that they don't want you to see the content, it's that they want to control how you see that content. They want it wrapped in their page, with ads, and not summarized on a search page.

      Yes, there is something wrong with it. They published it in a public medium. The snippet that search engines used to
      • Yes, there is something wrong with it. They published it in a public medium.

        Not really. They put it on their website with the understanding that the majority of people would be using a traditional web browser to access the content. It's not like they printed it on millions of flyers and carpet bombed every city and town.

        If all the content you have to offer can be summed up in a 20-word blurb, your content sucks and your site deserves to die.

        Search engines do one of two things 1) they print the first couple

        • Not really. They put it on their website with the understanding that the majority of people would be using a traditional web browser to access the content. It's not like they printed it on millions of flyers and carpet bombed every city and town.

          You're right. There's no way they could have reach as many people via carpet-bombing as they do via the web. Their website is a medium for communication, and it is open to the public. It is a public medium.

          The whole point of the first paragraph of the story is
    • there is nothing wrong with content producers wanting to control how their content, in particular, the stuff they paid to generate, from being indexed.

      True, nothing wrong with wanting. But as Jayne says, "If wishes were horses, we'd all be eat'n steak!"

      (I know that's not the original quote, but I like Jayne's version.)

      Gee, just a thought, but what about a way to display a summary and an ad chosen by the content producer along with the summary? Advertisers would spend lots for that kind of exposure.

      Yes,

    • by Viceice ( 462967 )
      Because I, and along with a lot of people on the internet, will refuse to click on a link that i don't know where it'll lead. Call it years of avoiding dodgy links that lead to nothing but spam and advertisements.

      So say i Google a piece of news, and i find a link with a summery and a thumbnail, versus a plain link with no description of whats to come, I'm going to click on the one with the summery simple because i know i'm going to get what i want and as we know from Google's success, giving people what the
  • Pointless (Score:4, Insightful)

    by HalAtWork ( 926717 ) on Sunday December 02, 2007 @05:22PM (#21554389)
    If you put it on the internet, and users are meant to access it, why should search engines differentiate any content based on probably arbitrary criteria? If pay sites restrict content and give out special logins for paying users, search engines cannot index it and the content is kept 'private'. If a site that has non-restricted content (restricted by special login) then why shouldn't it be indexed? It would be a disadvantage to the end user, because they cannot find the content as easily (especially if the web site's search engine sucks) and it would be a disservice to the content provider, since their site would be less likely to show up in search results. What is the point? Is this the same thing as people disabling right-click on certain web sites to try and prevent you from 'stealing' content, the same content that is available in your cache, and that would be illegal to use if the content is copyrighted anyway? Is this the same thing as people embedding pictures in flash for the same reason? If all of this results in less usable, less indexable, and more annoyances, just to restrict the way content is accessible and viewed?

    Then that's not the web anymore, that's not really in the spirit of the internet... why not just stick to print or something? And then have it in a special store where you can only buy it with some currency you made up, with an exchange rate you control? Oh, and have a special door for the store that can only be opened with a special device you have to order! Er, anyway... I hope you can understand my point.
  • Once Google admitted it can and will/does filter search results, it opened the floodgates for stuff like this.

    Don't say i didn't tell you so....
  • oblig... (Score:3, Funny)

    by doyoulikeworms ( 1094003 ) on Sunday December 02, 2007 @05:33PM (#21554489)
    Bustin' ACAP in Google's ass.
  • What a joke (Score:3, Informative)

    by WindBourne ( 631190 ) on Sunday December 02, 2007 @05:33PM (#21554495) Journal
    If these publishers want to own the search engines, then they should build their own! These engines do them a favor. This is no different than the music publishers trying to control the bands and how they get paid.
    • If these publishers want to own the search engines, then they should build their own!
      - yeah, we know, with blackjack and hookers, in fact forget the search engines and the blackjack.
  • by jhRisk ( 1055806 ) on Sunday December 02, 2007 @05:37PM (#21554517)
    I think the mistake we're using here is that we're assuming most folks consume their news like we do. Sorry to generalize but I believe most of us seek to become informed and thoroughly review and critique what we read. However, most people are satisfied with tidbits and in fact want nothing more. For example, the macob are satisfied with a headline like "Multiple Car Accident Kills 50" and a thumb of the pile up... the noseies like "Brad Wears Ugly Glasses For the First Time" and a thumb... etc. Yes those are terrible headlines and hyperbole to make my point. Imagine a search engine unlike Google which provides summaries of multiple sources offering these tidbits in a single page without the source's ads? Oh wait http://www.ask.com/ [ask.com] and perhaps others although I'm stating soley that they have such a type of offering and not that they do so violating any rules.

    I'm against most tactics that appear to be an organization seeking to squash an alternative or new and unknown element they think is encroaching on their bottom line and this move smells of it but feel it's a rare case of smoke without an actual fire. Just wanted to throw that out there while I seek more info on this tidbit.
  • Seems to be a lot of people slightly upset over this. But I think this is a good thing. They already have the ability to stop search engines from indexing at all. Now they have much more fine grain control. They can also make their results more useful by setting expiry dates. Presumably they'll also be able to be more specific about what he summary says, and might actually be more useful.

    Now some sites will probably want to over control, but they'll lose out.
  • Just so I'm clear (Score:5, Insightful)

    by Jay L ( 74152 ) <jay+slash @ j ay.fm> on Sunday December 02, 2007 @06:13PM (#21554729) Homepage
    A bunch of publishing organizations have gathered together and are attempting to create an Internet standard for restricting searchable content.

    They haven't involved Google, Yahoo, or Microsoft in the process. In fact, the only search company they mention in their FAQ is Exalead, who I didn't even think I've heard of (though now I think I may have once downloaded their desktop trial product).

    This is going to be implemented how?

    In related news, I have issued a new policy for how I (and anyone who joins my club) am to be treated in airport security lines. I will be publishing this policy on my home page, and I am certain it will win widespread adoption among travelers.

    Q:Have you discussed this with security administrators?

    A:In addition to the many travelers who have co-signed the new policy, we have an agreement-in-principle from Madge, the security and commissary chief at the fourth-largest regional airport in greater Bozeman.
  • > The desire for greater control over how search engines index and display Web sites

    Then design your sites better. Seriously. When I was on the team that launched http://jacksonville.com/ [jacksonville.com], we spent a decent amount of time thinking about how to optimize our site for search engines, and that was 10 years ago. Too much showing? Not enough showing? Spend more time developing and designing your site ... instead of trying to emulate your print product (ahem ... *cough http://nytimes.com/ [nytimes.com] cough*)
  • ...there's nothing quite like watching the traditional, embattled news sources "innovate" themselves right out of existence. They were slow to respond to the web and didn't understand it when they first did (they've gotten better), and now they're going to ACAP themselves into obscurity. Way to go guys! You're the bleeding edge of reporting!
  • Makes no sense (Score:3, Insightful)

    by mattwarden ( 699984 ) on Sunday December 02, 2007 @07:50PM (#21555307)
    If search engine caching of their content is hurting these publishers, then they would use currently-supported methods to keep crawlers out:

    User-agent: *
    Disallow: /

    Oh, but that's right, they do want to be indexed in search engines because it increases their revenue.

    So, what's the problem, again?
  • If these guys want anybody to pay attention, they should submit their protocol as an RFC. Their "standards document" is badly written. It has statements like "Features that are ready for implementation now, but only for use in crawler communication by prior arrangement, are labelled with an amber spot. These represent a minority of extensions for which there are possible security vulnerability or other issues in their implementation on the web crawler side, such as creating possible new opportunities for

  • If they don't like it, remove them from the index. Watch how fast they shut their pie-holes then.
  • In my opinion, google would be insane to agree to any restriction other than telling the sites "if they don't want to be in google, we let any site opt out already". Google has all of the power - if a site doesn't exist in google, it does not exist.
  • huh? (Score:2, Insightful)

    by wap911 ( 637820 )
    What do they not understand about *DO NOT CRAWL*? Robots.TXT is just fine. If it ain't broke, don't try to fix it. So now I have to have a .robotaccees to go along with .htaccess?
  • If a site complains or uses ACAP - Google should just drop them.

    The Google "site death penalty" - you become (rightfully) irrelevant.

    I wish I could set in my Google preferences to exclude sites the use "noarchive" or "nosnippet".

    Like those journals that feed Google the whole content but just give surfers a subscription page. Such as Blackwell-Synergy - I keep submitted them to Google's spam page since they do that - in direct violation of Google rules.

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...