Publishers Seek Change in Search Result Content 181
explosivejared writes "The Washington Post is running a story on the fight between publishers and search engines over just what exactly is allowed to be shown by search results. From the article: 'The desire for greater control over how search engines index and display Web sites is driving an effort launched yesterday by leading news organizations and other publishers to revise a 13-year-old technology for restricting access. Currently, Google, Yahoo and other top search companies voluntarily respect a Web site's wishes as declared in a text file known as robots.txt, which a search engine's indexing software, called a crawler, knows to look for on a site ... [new] proposed extensions, known as Automated Content Access Protocol, partly grew out of those disputes. Leading the ACAP effort were groups representing publishers of newspapers, magazines, online databases, books and journals. The AP is one of dozens of organizations that have joined ACAP."
The Text I Actually Submitted (Score:5, Interesting)
Re: (Score:3, Interesting)
However, I tend to agree with you, and when I don't see a relevant summary, I'm simply less likely to click through to the page, so this may well backfire on them. Either they're not understanding search users' usage patterns, or else they believe that this is so prevalent that nothing will have summa
Re: (Score:3, Funny)
As a slashdot user, I *only* look at the summaries. I don't click to read the actual article, but learn everything I need to know about a subject simply by the summary available on google.
It works fine here, so why not on google?
Re: (Score:3, Insightful)
Re: (Score:3, Interesting)
I do. Every time I hear about something like this, the site goes on my CustomizeGoogle blacklist, never to be seen again. It was the slashdot policy of posting "registration required" links to the New York Times that got me started on this path, and honestly, I'm better informed for it. All these big "news" publishers deliver is sanitized, oversimplified, dumbed down, biased and superficial stories blended with propaganda and outright lies concocted by priva
Re: (Score:3, Insightful)
Wow. Don't you think you're overreacting, just a little?
Your sig is particularly ironic here. If you want information to be free, you're welcome to offer to pay the salaries of all the journalists, reporters, cameramen, sound crews, and support folks who are out there all over the world collecting it. Go ahead and put your money where your mouth is.
Re:The Text I Actually Submitted (Score:5, Interesting)
Your sig is particularly ironic here. If you want information to be free, you're welcome to offer to pay the salaries of all the journalists, reporters, cameramen, sound crews, and support folks who are out there all over the world collecting it. Go ahead and put your money where your mouth is.
I am.
I'll be launching a service in the new year to help actively creating artists make a profit off selling original works, leveraging the copyleft and mashup cultures to generate a fanbase and simultaneously devalue the global copyright pool.
For the right types of creators, the strategy of increasing the amount of budgets available for custom work by annihilating the cost of existing bodies of work is a valid one, and I intend to make it very easy for those types of people to do so as a side effect of their making money off the things that you cannot copy.
You'll excuse me if I wait till the new year to slashdot myself, but I assure you, I have sunk hundreds and hundreds of man hours and a lot of my own dough into putting my money where my mouth is, and when I'm ready, you will know all about it whether you like it or not, because it will be some noteworthy stuff.
So no. I don't think I'm overreacting at all. I like to think when it all pans out in the end I'm going to play some small but important personal role in bringing the old things crashing down as a matter of fact. And have the people doing the real work be richer for it.
Re: (Score:2)
Well, for what it's worth, I admire your dedication and willingness to give it a shot. I can't help but suspect it will only be a drop in the ocean, but best of luck to you. If you make it and in five years we're all having a very different discussion, I will be more than happy to concede that I was wrong today.
Re: (Score:2)
Is there a way to pay/tip OSS coders directly? I suppose that might be such a great thing as it becomes a popularity contest - and some code though vital might not attract as much attention from the masses.
Re: (Score:2)
Re: (Score:2, Insightful)
Re: (Score:3, Interesting)
Re: (Score:2, Insightful)
They want you on their site, but they want the power to summarise and manage their search engine face to maximise foot traffic whilst not giving the whole story away.
Re:The Text I Actually Submitted (Score:4, Informative)
Re:The Text I Actually Submitted (Score:5, Informative)
Simple - They want to have their cake and eat it too.
They already have the absolute power to block Google. Further than that, Google (and every major search engine out there) honors the robots file, so they don't even need to go so far as actually "blocking" Google, they can politely tell it to go away.
However, doing that amounts to committing web-suicide for any online content producer, and the publishers know it. So they can't really do that. Thus, they bitch and whine about the unfairness of all the traffic (and corresponding ad revenue) Google brings them, for the sake of the very small number of "lost" hits resulting from people getting a sufficient answer directly from the search results page.
Can you hear the violins?
So that's what they're bitching about? (Score:2)
Almost never do I see a Google result and say "Ok, I know all I needed to, not going to click." More often, I see one and say "Gee, looks like that site won't be very helpful, let's move on to the next one." I can only imagine my response would be like that, only more so, towards anyone who could allow Google to index them without allowing Google to summarize them.
Re:The Text I Actually Submitted (Score:4, Informative)
1) "Every other media is opt in" -- not true. Fair use applies for most media, allowing summaries and brief quotes without permission, which is what this is about. E.g.: Watch your TV news and you'll often see video taken from other TV news shows, clearly often without explicit permission -- does any US station pay Al Jazeera when they use their video?
2) The web has always been "opt-out". Thus if you change this assumption, the vast majority of web pages, with no expressed policy, would be excluded from search engines.
A prediction (Score:5, Insightful)
That being said, I submitted this to point out the misstep I think publishers are taking. Search engines and aggregators drive their business, and usually they do it for free. I don't understand why anyone would think it would be a good idea to mess with that.
This being Slashdot, I predict that huge numbers of people will now arrive in this thread and say that you're absolutely right, the search engines are providing a great service, and the publishers should just suck it up because they'd die without them.
The thing is, they're completely wrong. It's actually the other way around, for the simple reason that news aggregators produce no useful content of their own.
For you or me, as someone who wants to know what's happening today, we can do one of two obvious things using a web browser. We can visit a specific news site we already know about (or at least guess at a URL), or we can start with an aggregator like Google News. Either way, many people will only read the headlines and summary for most stories. Either way, someone had to go out and get the information to write that story. But in one case, the people who brought the knowledge to the public get the page hit, while in the other, the search engine gets the hit in exchange for ripping much of the value of the other sites' content and the people who actually provided the content get nada.
It's common at this point for someone to pipe up with a fair use argument, but again, they are wrong, for the simple reason that while the headlines and summaries on news aggregators may only be small excerpts from the entire article, they represent a very significant chunk of the value. You can easily determine this by observing the proportion of users who look something up on an aggregator and never follow through to read any article in more detail; I don't know exactly what the answer is, but I'll wager it's a substantial proportion, perhaps even the majority.
Another common argument is that the news sites would die without input from search engines, but again I can't believe this is really true. When I reach lunchtime at work, I do not visit Google to find the BBC News web site, I just type in news.bbc.co.uk. (Actually, I visit the bookmark, but the first time that's what I typed.) Google, or any other news aggregator, is wholly unnecessary to my finding the main news site. Even without that, I could easily have guessed that the BBC News web site could be reached at www.bbc.co.uk/news or news.bbc.co.uk, either of which would have got me there immediately. The site is advertised via the BBC's other media as well. A significant proportion of the links I e-mail to and receive from friends and family are direct links to stories on the site.
Basically, if every search engine on the planet disappeared tomorrow, I rather doubt the big news services would care. As with everything else to do with search engines, they are just a middleman service, and one that is entirely expendable. If they weren't around, the Internet community would just develop an alternative or five, probably rather quickly, just as it always does.
On the other hand, if the big news services stopped providing news tomorrow, aggregator services like Google News would be completely dead, because they provide absolutely no value in themselves. They simply scrounge content from one source and visitors from another, and insert themselves as a middle man to cream off some of the profits.
The very fact that one service could survive quite happily without the other, while the other would die immediately without the first, tells us everything we need to know about the merits and public service benefits of each. That being the case, I find it hard to argue with the publishers' position that the news aggregators are basically ripping them off, and I don't really have much sympathy with the two most common counter-arguments people seem to be making in this Slashdot discussion.
Re: (Score:3, Interesting)
Re: (Score:3, Insightful)
If the news and book sites wanted to keep the search engines out, they would just set up their robots.txt files to block all access. Then they would never show up on Google. The don't want to do that because they know it would be death to them.
I'm not at all convinced that's really true. To borrow a related copyright-area theme, it's like the RIAA saying that they have to use DRM, because otherwise no-one will buy legal copies of their stuff. It's just an assumption, which they aren't yet willing to risk violating in case it goes wrong. That doesn't necessarily mean that if they had no choice but to work on a different basis, they'd lose out.
But you contradict yourself as well. You say that if the search engines disappeared, the internet would just create more, but then you say that if the big news services stopped providing news, the search engines would die. No they wouldn't. The internet would create more, filling the need.
Actually, I said the community would create alternatives. I have been rather sceptical about the real
Re: (Score:3, Insightful)
And yet, every time I Google, I find what I'm looking for. To my mind, that's useful.
Re: (Score:2)
Now I build a sand castle by gathering sand and water together.
This is what news aggregations do for me. Aggregators are in the service of providing "emergent content."
Aggregator mass information relevant to some topic so that the big picture can be seen. This picture emerges after the aggregation of information(the sand) is structured so one can see the whole picture(the sand castle).
Re: (Score:2, Interesting)
Re: (Score:2)
I wouldn't know. I'm not sure I've ever met someone who actually uses these services. Pretty much everyone I see using the web just goes straight to their favourite source(s).
Re: (Score:3, Insightful)
You're imposing blanket assumptions on a specialist niche, which is never a smart thing to do. TFA is talking about news sources, and so is everyone else in this discussion. What anyone else on the web does is pretty much irrelevant here.
And in any case, you're wrong about the value. In the US, which has one of the most liberal fair use regimes in any jurisdiction today, whether the copy being made affects the value for the original is a major question when deciding whether the copy constitutes fair use.
So they tell you what they don't want you to see? (Score:5, Interesting)
Just don't do it in the US or someone will tell the judge: "The defendant knowingly circumvented the DRM - which is called ACAP - of our online newspaper".
ACAP - Anonymous Coward Anonymously Posting
Re:So they tell you what they don't want you to se (Score:3, Interesting)
Personally, I don't really see the problem. You either want your site spidered or you don't. You don't get to control the presentation of the data that is spidered, only the search engines get to do that.
SO the thing is here is that Google takes its ordinary web spider, applies a little magic to it, and then displays the results as a news page. Big deal.
You either want
Re:So they tell you what they don't want you to se (Score:3, Funny)
This says it all... (Score:4, Insightful)
same old rule applies; never trust anyone who uses business terms like ROI, for he cares not for you or society, but only for what he can remove from your wallet, without getting arrested over it.
Re: (Score:2, Insightful)
My opinion? Good riddance to the lot of them. Please take all the "yahoo
Re: (Score:2)
My reaction... (Score:5, Insightful)
Terms & Condition (Score:5, Insightful)
Re: (Score:2)
but your analogy isn't correct. It's more like a library charging people to look through the catalog to see if the books they want are present.
Re: (Score:2)
But how many people read these dozen newspapers? I would guess a lot more than twelve people.
Re: (Score:2)
Here's the documentation (Score:5, Informative)
At first glance it appears to be a set of extensions to robots.txt that allow newspapers to specify things like:
This article will disappear from our site in N days, so it better disappear from search engines at the same time
Don't frame this article
Don't extract images or thumbnails from this article
If you show a cached copy of this article, it better include the original ads
etc.
Re: (Score:2)
Re: (Score:2)
The question I have is how much are they willing to pay the search engines for this functionality?
I think you have it backwards. The real question is how much the search engines are willing to pay the news sources they rely on for continued permission to reproduce their work in any form at all.
And for those about to comment about how for some magical reason copyright doesn't apply here, please note the details in TFA about the settlements between Google and a couple of major sources that have already taken place. Someone who's checked with the lawyers doesn't think it's as cut and dried as that.
Re: (Score:2)
That would be... nothing. Cipher. Zero. Zip. Null. As has been pointed out ad nauseum, if the news sources don't want Google indexing their site, they can use robots.txt. Of course, they don't really want that; they want to be indexed, but they want to be indexed their way, and not the indexers way. Toug
Re: (Score:2)
Seriously (Score:5, Insightful)
Here's a tip... (Score:5, Insightful)
Here's a tip:
If you don't want something to become public knowledge -- accessible by anyone -- then don't put it on the internet.
Re: (Score:3, Interesting)
It would make far more sense for these institutions to just take their sites completely off of the search engines via robots.txt and save up those slots in the search results for sites that want traffic. Or perhaps limit it to just the front page, but I think that one can sti
Re: (Score:2)
Or use the sitemaps protocol to let them spider what semi-private information you want to offer and let users of your site decide whether or not it's worth their time to login (or whatever authentication method you choose) to read what they deem acceptable.
If you put shit up on the web for everyone to read, that will include spiders, and then stop whining when public information is read by, *g
Re: (Score:2)
You must be new here. I thought the internet was designed to route around breakage.
Historical footnote: where robots.txt came from (Score:5, Interesting)
Back in 1993, when I was teaching myself Perl in my spare time (while working for a -- cough -- UNIX company called The Santa Cruz Operation -- no relation to the current Utah asshats of that name), I was practicing by working on a spider. Now, back then SCO's Watford engineering centre was connected to the internet by a humongous 64kbps leased line. And I was working with a variety of sources on robots, and it just so happened that because I was doing a deterministic depth-first traversal of the web (hey, back then you could subscribe to the NCSA "what's new on the web" bulletin and visit all the interesting new websites every day before your coffee cooled), I kept hitting on Martin Kjoster's website. And Martin's then employers (who were doing something esoteric and X.509 oriented, IIRC) only had a 14.4kbps leased line. (Yes, you read that right: a couple of years later we all had faster modems, but this was the stone age.)
Eventually Martin figured out that I was the bozo who kept leeching all his bandwidth, and contacted me. Throttling and QoS stuff was all in the future back then, so he went for a simpler solution: "Look for a text file called
So if you're wondering why robots.txt is rather simplistic and brain-dead, it's because it was written to keep this rather simplistic and brain-dead perl n00b from pillaging Martin's bandwidth.
Ah, the good old days when you could accidentally make someone invent a new protocol before breakfast
Re: (Score:2)
And I hear you aren't a bad writer either. I turned someone on to Peter Watts the other day at lunch (software developers for large grocery chain) and he turned me on to your writing. Now of course you are popping up everywhere, damn synchronicity.
Re:Historical footnote: where robots.txt came from (Score:4, Interesting)
I'm fascinated at the beginnings of the web and the people who drove it.
If you know any place where I can hear more of these please let me know. (reading your blog right now)
Re:Historical footnote: where robots.txt came from (Score:5, Funny)
And the link to ACAP... (Score:4, Informative)
Re:And the link to ACAP... (Score:5, Funny)
Give 'em what they want... (Score:2)
What right do they have to limit crawlers? (Score:5, Insightful)
Banning Google from visiting a page and then summarizing its result on a search page is pretty much equivalent to Slashdot banning me from saying "There's this article at goatpron.slashdot.org/whatever that has a description of goat bestiality that I think you might find interesting".
As long as the summaries are sufficiently short so that they fall under the fair use exception (which Google search results surely do), Google can keep on doing what they're doing.
Re:What right do they have to limit crawlers? (Score:5, Insightful)
Now, I realize that these people are idiots, and that probably their future involves a wall, their backs, and a revolution, but at present their counsel is widely respected among the holders of wealth and power. So when you say that robots.txt is "not a contract" you should talk to a lawyer about that. You'd be amazed at the things they say.
Re:What right do they have to limit crawlers? (Score:5, Insightful)
Re: (Score:2)
Re: (Score:2)
I'm sure they will have all sorts of things to say, particularly if they can get you to pay them to say it. None of that changes the fact that if you print something and put it on public display, don't be surprised if people look at it. If it's at all interesting, some of them may talk about it. If that's not desired, then take it down.
There are sites that do go way too far trying to make the content look like their own, but that isn't going to be fixed by a more complicated robots.txt
Re: (Score:2)
This is more like banning you from copy/paste and re-posting 1/2 or the full article to karma-whore for a +5 Informative, "before the site is down".
Google does not have an AI that passes the Turing-test yet. They don't summarize, paraphrase, or otherwise reinterpret content.
They just extract and render pieces as-is - it's a direct quote.
It falls into fair use only as long as Google doesn't karma-whore too obviously.
The discussion of 'rights' is silly anyway - they hav
Re: (Score:2)
ANAL.
Re: (Score:2)
Re: (Score:2)
Go for it, publishers! (Score:5, Insightful)
Re: (Score:2)
Re: (Score:2)
I wish I had thought of this. We used to have these crazy neighbors across the street who would come out on their porch whenever my wife and I would go out our front door. They would just watch. Fucking neighbor TV. When I was bored, I'd walk in and out 10-15 times a day. At least we were both getting our exercise.
Re: (Score:2)
robots.txt a W3C issue (Score:5, Informative)
http://www.w3.org/2001/tag/group/track/issues/36 [w3.org]
See Tim B-L's original mail here:
http://lists.w3.org/Archives/Public/www-tag/2003Feb/0093 [w3.org]
One can only hope that any new efforts keep this issue in mind (hint: stop polluting *everyone's* namespace!).
Good (Score:2)
Hoisted by their own petard (Score:5, Insightful)
Fine. But as we all know, we probably have a few sites that we book mark and visit often. We probably get alot of news from RSS. But alot of people are directed to sites via search engines. So if a content producer, say a news paper, doesn't want it's content indexed, then fine. It will only result in a LOSS of traffic to their site.
Look, content producers have to make money. They have people to pay, stuff to print, etc. They have expenses. It is truly sad that rather than trying to figure out how to make content relevant and useful, some content producers simply want to continue analog methods in a digital world.
Gee, just a thought, but what about a way to display a summary and an ad chosen by the content producer along with the summary? Advertisers would spend lots for that kind of exposure.
Re: (Score:3, Insightful)
I'm going to assume that you actually mean "...is being indexed."
Re:Hoisted by their own petard (Score:4, Insightful)
Ha, yeah. I recently purchased a DVD where the opening scene showed a kid snatching a woman's purse and running off, with the voice of doom saying "You wouldn't steal purse would you?" in a ridiculous, nay, pathetic attempt at "moral suasion". I was then subjected to several more unskippable minutes of this asinine lecturing, various legal threats, plus a couple of movie previews and advertisements that I couldn't skip past either. What the hell? So by the time I reached the main feature, I was so irritated (seeing that I'd just paid sixteen bucks for the damn disc) that I pulled the disc from the player and fired up DVD-Shrink. Half an hour later I had a re-authored copy without all the crap, and that's what I watched.
Idiots.
Re: (Score:2)
Re: (Score:2)
Well, let's face it
Re: (Score:2)
And yet millions or billions of people worldwide seem to want the man to hunt down fraudsters who pretend to be one of the aforementioned billions.
Re: (Score:2)
Yes, there is something wrong with it. They published it in a public medium. The snippet that search engines used to
Re: (Score:2)
Not really. They put it on their website with the understanding that the majority of people would be using a traditional web browser to access the content. It's not like they printed it on millions of flyers and carpet bombed every city and town.
Search engines do one of two things 1) they print the first couple
Re: (Score:2)
You're right. There's no way they could have reach as many people via carpet-bombing as they do via the web. Their website is a medium for communication, and it is open to the public. It is a public medium.
The whole point of the first paragraph of the story is
Re: (Score:2)
True, nothing wrong with wanting. But as Jayne says, "If wishes were horses, we'd all be eat'n steak!"
(I know that's not the original quote, but I like Jayne's version.)
Yes,
Re: (Score:2)
So say i Google a piece of news, and i find a link with a summery and a thumbnail, versus a plain link with no description of whats to come, I'm going to click on the one with the summery simple because i know i'm going to get what i want and as we know from Google's success, giving people what the
Re: (Score:2)
That is one model, but for consumer-ish stuff like news, folks have grown used to getting it free from TV, radio, and even on-line. Peeps just aren't that interested in paying for it, so the OTHER model is to make money from advertising. So the theory is if a user reads a
Pointless (Score:4, Insightful)
Then that's not the web anymore, that's not really in the spirit of the internet... why not just stick to print or something? And then have it in a special store where you can only buy it with some currency you made up, with an exchange rate you control? Oh, and have a special door for the store that can only be opened with a special device you have to order! Er, anyway... I hope you can understand my point.
The Genie.. (Score:2)
Don't say i didn't tell you so....
oblig... (Score:3, Funny)
What a joke (Score:3, Informative)
Re: (Score:2)
Average people and news consumption (Score:3, Insightful)
I'm against most tactics that appear to be an organization seeking to squash an alternative or new and unknown element they think is encroaching on their bottom line and this move smells of it but feel it's a rare case of smoke without an actual fire. Just wanted to throw that out there while I seek more info on this tidbit.
It's not a bad thing. (Score:2, Interesting)
Now some sites will probably want to over control, but they'll lose out.
Just so I'm clear (Score:5, Insightful)
They haven't involved Google, Yahoo, or Microsoft in the process. In fact, the only search company they mention in their FAQ is Exalead, who I didn't even think I've heard of (though now I think I may have once downloaded their desktop trial product).
This is going to be implemented how?
In related news, I have issued a new policy for how I (and anyone who joins my club) am to be treated in airport security lines. I will be publishing this policy on my home page, and I am certain it will win widespread adoption among travelers.
Q:Have you discussed this with security administrators?
A:In addition to the many travelers who have co-signed the new policy, we have an agreement-in-principle from Madge, the security and commissary chief at the fourth-largest regional airport in greater Bozeman.
*doh* (Score:2)
Then design your sites better. Seriously. When I was on the team that launched http://jacksonville.com/ [jacksonville.com], we spent a decent amount of time thinking about how to optimize our site for search engines, and that was 10 years ago. Too much showing? Not enough showing? Spend more time developing and designing your site
Hrmmm... (Score:2)
Makes no sense (Score:3, Insightful)
User-agent: *
Disallow: /
Oh, but that's right, they do want to be indexed in search engines because it increases their revenue.
So, what's the problem, again?
So where's the RFC? (Score:2)
If these guys want anybody to pay attention, they should submit their protocol as an RFC. Their "standards document" is badly written. It has statements like "Features that are ready for implementation now, but only for use in crawler communication by prior arrangement, are labelled with an amber spot. These represent a minority of extensions for which there are possible security vulnerability or other issues in their implementation on the web crawler side, such as creating possible new opportunities for
I have a better idea... (Score:2)
Remove them from the index (Score:2)
what does google get out of this (Score:2)
huh? (Score:2, Insightful)
Complain/lock down with ACAP - Google should drop (Score:2)
The Google "site death penalty" - you become (rightfully) irrelevant.
I wish I could set in my Google preferences to exclude sites the use "noarchive" or "nosnippet".
Like those journals that feed Google the whole content but just give surfers a subscription page. Such as Blackwell-Synergy - I keep submitted them to Google's spam page since they do that - in direct violation of Google rules.
Re: (Score:2)
If search engines follow this ACAP standard and no longer index more than a tiny snippet of the content, then nobody will be able to avoid adverts [adblockplus.org] or avoid registration [bugmenot.com] ever again.
Re: (Score:2)
Robots.txt already allows sites to tell search engines what to index, and what not to.
Re: (Score:2)