Forgot your password?
typodupeerror

Google and Yahoo! Working Together On Better Web Indexing 94

Posted by CmdrTaco
from the my-robots.txt-thanks-you dept.
Karzz1 writes "In an exclusive video interview with WebProNews, Yahoo and Google announced a collaborative site called sitemaps.org. Yahoo!'s Tim Mayer states in the video, 'This is something we are announcing tonight at around 9 PM tonight (Las Vegas) Google and Yahoo have gotten together to provide webmasters and publishers a unified way to send their content... let our search engines know about new and existing content.'"
This discussion has been archived. No new comments can be posted.

Google and Yahoo! Working Together On Better Web Indexing

Comments Filter:
  • by eldavojohn (898314) * <eldavojohnNO@SPAMgmail.com> on Thursday November 16, 2006 @01:11PM (#16871392) Journal
    Well, I went to the sitemaps.org site & looked around for the people owning/running/maintaining the page. In the TOS, I found it to start with:
    Terms of service

    This is a contract between you and each of the sponsors of Sitemaps.org: Google, Inc., Yahoo, Inc., and Microsoft Corporation (referred to collectively in this agreement as the "Sponsors," "we," or "us"). By using the Sitemaps.org website (the "Website") you agree to be bound by the following terms and conditions (the "Terms of Service").

    Scope of Terms of Services; License

    These Terms of Service govern your use of the Website. The Sponsors' copyrights in the sitemaps protocol specification, as published on the Website (the "Specification"), are licensed to you under the Creative Commons Attribution-ShareAlike License (version 2.5). Other than the Sponsors' copyrights in this Specification, no intellectual property rights of any kind are granted or may arise under these Terms of Service, whether express, implied or otherwise.
    So as you can see, Microsoft is also involved in a project under the Creative Commons Attribution-ShareAlike License (version 2.5). Which is in and of itself something newsworthy in my opinion--since they're so often played as the bad guy on Slashdot. Frankly, the article states:
    "The quality of your index is predicated by the quality of your sources and Windows Live Search is happy to be working with Google and Yahoo! on Sitemaps to not only help webmasters, but also help consumers by delivering more relevant search results so they can find what they're looking for faster," said Ken Moss, General Manager of Windows Live Search at Microsoft.
    So why is Microsoft omitted from the summary & title of this news? Surely their Windows Live Search is contributing just as much as Yahoo!'s search or Google's search engine.

    I'm confused--when Microsoft does something good, do we just ignore it? You know, I'm all for criticizing their evil plans for world domination in the software market but shouldn't news be subjective not objective even if it is only for nerds?

    Side note, I'll bet this post hits rock bottom like any other post that says something positive about Microsoft [slashdot.org].
    • This is obviously a tech-community spin to avoid tainting the news from the start.

      Like if you were hosting a conference on global peace you might keep quiet about Dubya being a keynote speaker.

    • by Kozz (7764) on Thursday November 16, 2006 @01:30PM (#16871758)
      I'm confused--when Microsoft does something good, do we just ignore it?

      You must be new here.
    • by Salvance (1014001) * on Thursday November 16, 2006 @01:32PM (#16871798) Homepage Journal
      What I find interesting is that this article was submitted multiple times last night with Microsoft's name actually included Microsoft in the title (the Firehose is a pretty cool feature of being a subscriber BTW since you can see all submissions, not just accepted ones). Either the wording wasn't as concise/clear (I don't remember), or there was a little bias exhibitted by the editors.
    • Re: (Score:3, Informative)

      by mpcooke3 (306161)
      To be fair google and yahoo are the big search engine players, MSN search is under 15% of the market compared to say googles at around 45% and yahoo at around 30%.

      Source: http://searchenginewatch.com/showPage.html?page=21 56431 [searchenginewatch.com]
    • by Karzz1 (306015) on Thursday November 16, 2006 @01:38PM (#16871904) Homepage
      I was the one who submitted this article and the reason MS was not mentioned is that they were not involved in the video interview that was released. I could have been a bit more specific with regards to the description though, so as not to ignore MS involvement in the project.

    • Re: (Score:3, Insightful)

      I'm confused--when Microsoft does something good, do we just ignore it? You know, I'm all for criticizing their evil plans for world domination in the software market but shouldn't news be subjective not objective even if it is only for nerds?

      You got those backwards. Objective means without bias while the news you are complaining about is subjective, it is biased towards downplaying the good things Microsoft does.

      Semantics, they'll get you everytime.
    • Re: (Score:3, Interesting)

      by _Sprocket_ (42527)

      Side note, I'll bet this post hits rock bottom like any other post that says something positive about Microsoft

      Sometimes I get this petty little feeling that there should be a "-1, Martyr Complex" mod option. But of course, this only feeds said complex. And that's the problem with a lot of moderations - sometimes its more effective responding with why an opinion might be missing something.

      Having said that - responding is also only so effective. The linked example works well to demonstrate it. In the

    • > shouldn't news be subjective not objective even if it is only for nerds? Umm, I think we all got your point, but Subjective would make the omission fine and expected. It's Objective we want...
  • from the my-robots.txt-thanks-you dept.

    As we learned a short while ago [slashdot.org], this initiative will make it that much easier for bots to detect what content a site has to offer. Is this good or bad for the end users of the internet--will it just increase the incentive for spiders and bots to crawl sites? What is the real purpose of this collaboration? To me it looks like an attempt for the search engines to get content providers to make the search engine's job that much easier.

    • Re: (Score:3, Insightful)

      by Rob T Firefly (844560)

      What is the real purpose of this collaboration? To me it looks like an attempt for the search engines to get content providers to make the search engine's job that much easier.

      That makes sense, though. The whole reason for the web is the content provided by content providers, and they need the search engines to know what they have to offer just as badly as the search engines need the content to search for. It's all symbiotic, and it is just logical that one side is willing to help the other do something

    • by kfg (145172)
      To me it looks like an attempt for the search engines to get content providers to make the search engine's job that much easier.

      Yeeeeeeeeeeeah, and. . .?

      KFG
    • Re: (Score:3, Informative)

      by garcia (6573)
      Is this good or bad for the end users of the internet--will it just increase the incentive for spiders and bots to crawl sites?

      I've been using Google's Sitemaps program for quite some time. I don't want the spiders crawling old and pointless content when there is new and more relevant stuff available for them to display to end users. Why would it increase spidering when they are being specifically told what and how important something is to spider?

      I have noticed a significant decrease in the overall spide
    • by juiceCake (772608)

      To me it looks like an attempt for the search engines to get content providers to make the search engine's job that much easier.

      Which is wonderful. Web developers can use a standardized file to help optimize search engine support, makng their job that much easier, rather than developing these types of guidance or sitemap files separately for each search engine. Afterall, most of my clients, and I'm hazarding a guess here I know, but most clients want their site to be able to be found via a search engine

  • by forrestf (1028150)
    i can see it now, GooYahoo
  • by hey (83763) on Thursday November 16, 2006 @01:33PM (#16871822) Journal
    Why not just have a link from your main page to an HTML sitemap that links to all pages on your site.
    Nice and easy. And usable by people and crawlers.
  • Lets say I have robots.txt set to deny everything, but I submit some pages to this thing for indexing. Does the spider obey robots.txt or what was submitted? Actually I'd find it handy to keep the spiders the hell off of my site but just submit a couple pages, but I don't see how that system could be trustworthy at all. Is it just me or is this just another form of meta tags?
    • When robots.txt and the sitemap conflict, robots.txt takes precedence. This is because robots.txt is a hard restriction and sitemaps are just a hint.
  • Subjective... (Score:2, Interesting)

    by scombs (1012701)
    Example Code from: http://www.sitemaps.org/protocol.html/ [sitemaps.org]

    <?xml version="1.0" encoding="UTF-8"?>
    <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9 ">
    <url>
    <loc>http://www.example.com/</loc>
    <lastmod>2005-01-01</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
    </urlset>

    Is it just me, or does the priority tag seem really objective and arbitrary? One webmaster's .5 could be another's .8...

    • Re:Subjective... (Score:5, Informative)

      by LanMan04 (790429) on Thursday November 16, 2006 @01:43PM (#16872004)
      That's exactly right. The priority tag only applies to pages on your site, and is a relative measure. So (I would assume) that assigning a priority range of 0 to 0.5 would be the same to the search engine as a range of 0.5 to 1.

      In other words, assigning a priority of 1 to all your pages will not affect their ranking vs. *other* sites that appear in the search results, only vs. other pages on your site. And if they're all 1, then you're telling the crawler that they're all equally important, just as if you had assigned them all a value of .5 (or anything else).
      • Re: (Score:3, Funny)

        by Pollardito (781263)
        You see, most blokes, you know, will be marking all their pages at ten. You're on ten here, all the way up, all the way up, all the way up, you're on ten on your priority. Where can you go from there? Where?

        I don't know.

        Nowhere. Exactly. What we do is, if we need that extra push over the cliff, you know what we do?

        Put it up to eleven.

        Eleven. Exactly. One higher.
      • Re: (Score:3, Interesting)

        by Jeff Molby (906283)

        In other words, assigning a priority of 1 to all your pages will not affect their ranking vs. *other* sites that appear in the search results

        Are you sure?

        If two pages from different site are determined to be of approximately equal relevance to the search, couldn't a search engine pick a favorite by using the internal priority ranking?

        Wouldn't a page on widgets be more relevant coming from a widget-maker (who would give it a higer internal priority than his gadget pages) than a similar page coming from a gad

        • I misread your post; I'm off-topic. Still, I can't imagine why a content provider would use different priorities. It could only hurt the rankings of some of the low-priority pages. Unless, of course, the search engines gave your high-priority pages an equivalent boost.
    • It is only a priority relative to your own pages.
      So I guess you could say it is very arbitrary but it is only used as a hint to show how the site owner would prefer his content to be spidered. If you had 10 million pages on your website but there were a few hundred you really wanted the spiders to be interested in then you would assign them a higher priority in your sitemap. It is relative to your own site only so it's ok that it works that way I think.
    • How could this be XML when they left out the closing tag?
    • by scombs (1012701)
      Wow, I'm awesome. The broken link is my own 1337 skills. Let's try this again: http://www.sitemaps.org/protocol.html [sitemaps.org] Mmm, closing tags.
    • Actually, I believe that's priority within the site. So, for example, your homepage might have a "0.8" your "Contact Us" page, might have a "0.5", your "News" section might be a straight "1.0", and your privacy policy a "0.2".

    • As an XML specification that is likely to be used by people who aren't experts don't you think it would have been a good idea to used *valid* XML in the example usage?
  • I'm for once interested in a joint project between search engines. If Google and Yahoo! can play nice, and Microsoft is mentioned as being part of this Web 2.0 menage-a-trois, perhaps something interesting will come of this. But right now it does just look like they want to make it easier to index pages. I've been attempting to submit my sitemaps to google for ages and have yet to see my sites listed when searching for my keywords, but perhaps that'll change in the future if this works out. Guess I'll keep
  • Text Browsers (Score:3, Interesting)

    by poindextrose (640377) <sliderule@FREEBSDgmail.com minus bsd> on Thursday November 16, 2006 @01:40PM (#16871950) Homepage
    It's too bad that the specification only covers information relevant to search engines.

    How about a <description> tag? I would take great interest in a sitemap specification that gives me enough information to navigate major parts of a site with a viewer plugin (of some sort) in a web browser.

    There's nothing worse than fumbling around navigating page after page when the web server is slow, the pages are image- or ad-heavy, or the navigation on the page just plain sucks.
    • by dsaraujo (798502)
      You just wrote the principles of meta tags in web pages. They didn't work.
      • The difference here being that this would be the equivilant of consolodating that information into one file for all the major sections of your site, permitting easy navigation with a small download.
  • So how can I submit my sitemap to Yahoo! and Microsoft/search.live.com? FAQ says something about sending a HTTP request to <searchengine_URL>/ping?sitemap=http%3A%2F%2Fwww.y oursite.com%2Fsitemap.xml, but it doesn't say what are searchengine-specific urls to use.

    Lukasz
    Hikipedia - free database of hiking trails [hikipedia.com]
  • Google is going to take over the world... and if there company record is any indication of their rule, I for one welcome our new advanced search indexing overlords.
  • From the site:

    Sample XML Sitemap

    The following example shows a Sitemap that contains just one URL and uses all optional tags. The optional tags are in italics.

    <?xml version="1.0" encoding="UTF-8"?>
    <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9 ">
    <url>
    <loc>http://www.example.com/</loc>
    <lastmod>2005-01-01</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
    </urlset>

    Are they missing a <url>

  • Many here ask why this is more than robots.txt. For one it offers to add URLs that are driven by databases and parameters. Thing that the SEs do not index too well. It also adds last updated stamps and priority for re-visit.

    Why is that important? So if I have one page where I always post the latest news, I can have the spider revisit every hour, so it get indexed ASAP. However the spider can go easy on the rest of my site otherwise. I also can train that spider for a burst, if I have for example an ongoin

  • The Django web framework added support for 'google sitemaps' over a month ago. Google anounced the details of sitemaps over 3 months ago. Django Sitemaps: http://www.djangoproject.com/documentation/sitemap s/ [djangoproject.com]
  • Call me an XML nazi but the example usage has unclosed tags:

    http://www.sitemaps.org/protocol.html [sitemaps.org]
    • Really? It looks like it's all there (and in all the other examples they have posted):

      <?xml version="1.0" encoding="UTF-8"?>
      <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9 ">
      <url>
      <loc>http://www.example.com/</loc>
      <lastmod>2005-01-01</lastmod>
      <changefreq>monthly</changefreq>
  • Google has had this for a while now [google.com]. I had noticed that development has been healthy recently. whereas before it was a relatively unnecessary tool, now it's actually useful.

    if it's as useful as Google Sitemaps, then I'm happy with today's news. the protocol does look pretty similar (and by pretty similar, I mean the XML structure is virtually identical). I'm guessing porting Google Sitemaps over to this new one will be painless.

  • This fall, I released free source code for people to use a PHP Class to generate SiteMaps for Google - and it seems like the standards group adapted Google's format. The code is perfect for dynamic database driven sites that can't readily use perl-scripts that sometimes perform this task. http://www.idealog.us/2006/09/google_sitemap_.html [idealog.us]
  • I just mused [pannonrex.com] about the search-unfriendliness of AJAX apps yesterday and how that could be solved and today the big three are banging (almost) the same door. What do you think how could we go about solving the issue?

"Pascal is Pascal is Pascal is dog meat." -- M. Devine and P. Larson, Computer Science 340

Working...