Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Google

Journal Cy Guy's Journal: Conflation Deflation? What is the WH Hiding from Google? 6

Aparently the WH doesn't want you to find some thing (or least not find it very easily) that has to do with Iraq. As a result they have structured their robots.txt file to exclude all directories named "iraq" from being searched by Google et al.

It looks like they generated most of it with a script to deliberately hide any directory with text (understandably if they want search engines to find the HTML version of pages and not the plain text ones) but also any directory about Iraq, including many many subdirectories on Iraq that don't exist in the first place such as "\Easter\iraq\".

While it's tempting to think the obvious reason would be obscure any files tying 911 to Iraq, and many directories that could potentially hold these are listed - it may just be they just catch your eye because 911 is listed first alphabetically in the file.

And it should also be stated that the intent can't be that nefarious or they wouldn't allow you to use their internal search engine to access the same pages blocked in robots.txt. But is there really a need for the WH to be preventing Google et al from fully archiving the entire site for posterity?

Any Google-sleuths out there tempted to try and determine the specific dirt that has been swept under the metaphorical carpet - please feel free to point it out here in the thread. Here is a list of pages that look like they should be blocked that Google still lists in its search. Note none of them have been cached though. Pages that aren't blocked by the WH - those in the /infocus/iraq/ aren't listed herebecause of the way I phrased the search - most of those are cached - so maybe Google ignores robots.txt except for purposes of caching?

UPDATE 11:18 PM - MORE CONFLATION DEFLATION

Rumsfeld has doubts about Al Qaeda - Saddam connection

Aparently this is news to some ppl in the Administration.

This discussion has been archived. No new comments can be posted.

Conflation Deflation? What is the WH Hiding from Google?

Comments Filter:
  • Supposedly there are two copies of each of those pages (two separate sections use the same content, I suppose) so they have it index only one to avoid duplicates.

    This came up as an issue about a year ago, IIRC.
    • But why only the Iraq pages - and why use a scattershot approach where non-existent directories are set off limits?

  • Thanks for the heads up.

    I always just directly peruse the site to find what I am looking for, anyway. It's actually fairly well organized considering the POTUS's hostility to transparency.
  • Google works just fine indexing the whitehouse.gov. Look here [google.com]. (Yeah, I know... old joke, but still funny)

"Go to Heaven for the climate, Hell for the company." -- Mark Twain

Working...