Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
The Internet

Learning to Love the Panopticon 142

mitd writes: "Cory Doctrow has written an insightful article about Google, search engines and how he stopped worrying."
This discussion has been archived. No new comments can be posted.

Learning to Love the Panopticon

Comments Filter:
  • He spelt Eschelon wrong.

    Anyway, I use copernic 2001 pro. Never touched a web engine for ages :D Best ever search client.
    • Re:Eschelon? (Score:1, Redundant)

      by JediTrainer ( 314273 )
      He spelt Eschelon wrong.

      Does anyone else find it amusing that we're obsessing over the name and spelling of a system that we don't even know (for a fact, with proof) exists?

      If it does exist, how do we know its name?

      If it doesn't exist... well... who cares what it's called?
      • It was based on the most common spelling used within the EU reports and from other web reports.

      • The notion that Echelon hasn't been proved to exist is both uninformed and naive.

        Cryptome.org has a definitive collection of documents concerning Echelon in an archive [cryptome.org]. Those desiring to test directly for themselves the existence of Echelon might consider sending some email using phrases from the Echelon trigger words list. This list, by the way, was circulated last year on newspaper wire services and isn't exactly top secret.
    • He spelt Eschelon wrong.

      That's OK -- /. spelt his name wrong. :-)

      It's Cory Doctorow, not Doctrow.

  • The Switch (Score:2, Informative)

    by Covant ( 103882 )
    I remember a few years ago, when I made the switch to Google. I was impressed from the get-go, and have never looked back. Everyone I talked to, everybody who was using some other search engine, I turned them on to google. (It wasn't hard)

    And now, in some places, rather than saying "do a search for [something]" people say "google-search it" (even if they don't use google).

    You know something's great when people make a verb out of its name.


    • You know something's great when people make a verb out of its name.

      Or very bad (tm)
      "Slashdotted and going "postal" spring to mind :-)

    • Re:The Switch (Score:2, Interesting)

      by cyborch ( 524661 )

      I don't know why, possibly it's the lack of web portal-ness of google, but very few non-geeks I know use google. They alle stick to the local Yahoo clone [jubii.dk].

      I may be missing something, but I really can't see the reason why... could anyone enlighten me, is there something geekish about google? Or is it just me thinking that non-geeks want to use more bloated and less efficient solutions than geeks?

      • Re:The Switch (Score:2, Insightful)

        by Covant ( 103882 )
        I'm pretty sure it's similar to the "use windows because it's on my machine when I buy it".

        "I'll use this search engine because that's what appears when I install AOL, @home, sypmatico, etc."

        That, and most people can't be bothered remembering more than two web addresses. www.hotmail.com (being replaced by simply using MSN messanger) and www.my-favourite-porn-site.com.

      • Perhaps Google doesn't do such a good job when searching on non-geek topics. I use the Web mostly for computer stuff and random urban legend / Kevin Bacon searches, so I wouldn't know. But maybe if you want to book a holiday a semi-automated index like Yahoo does a better job than just counting links (after all, who links to a competitor's site?).
      • Actually quite a lot of my friends use google or at least know that it exists. I think it won't be long before google really takes off into the public ;-)
      • Google is like the command line. It's fast, it's efficient, and it doesn't waste resources or cram trademarks down your throat. That's what geeks look for. Non-geeks very seldom look past what they already have and thus might never find google.
    • Re:The Switch (Score:2, Interesting)

      I've been using the expression "just google for X" when referring friends to some site about X that I know shows up high on the hit list. It's funny, google is starting to replace DNS for me-- instead of remembering or bookmarking URLs, I just remember the keyword to google for. For example, the URL http://www.uk.research.att.com/vnc/ [att.com] is harder to remeber than to just type "vnc" and hit "I'm feeling lucky."

      I often wonder how much less productive I would be if google went away tomorrow :) If anyone from google.com is reading this, thank you!

      • Hehe! Yes, very true. I used to obsessively bookmark anything I might possibly be interested in revisiting in the future. Now I rarely bookmark anything -- only things I know I will be visiting at least once a week. And even then, it's usually after a few weeks of doing it that I finally bookmark it. Now bookmarks are just handy click savers, rather than a memory of useful sites. I don't need a memory of useful sites, I can always find them again on Google...
  • Nice description (Score:3, Informative)

    by torqer ( 538711 ) on Sunday March 10, 2002 @09:14AM (#3137598)
    In case you were like me and really had no idea what the submitter was talking about in his description...

    The link is to an article that gives some insight into how google searches through the hordes and hordes of webpages. And bashes other search engines.

    Note to submitter: while brevity may be the soul of wit try to remember we haven't read the article yet and need just a little more information.

    • Actually Google isn't the only search-engine using these techniques (that is: rank sites after how many pages that link to them). I have been on some lectures with FAST [alltheweb.com] which Lycos and others are running their searches on. Perhaps they were first though, I don't know.

      The main benefit Google has these days, is that they have ~8000 PCs clustered which they run the searches on, while FAST (as an example) has only 600. Google can therefore take the freedom to do searches that cost more processingpower, while others have to think of smart techniques to maintain good results without using the same power.

      One example is that of searching for patterns, ie. several words in given order ("to be or not to be"). While Google uses their searchpower to find all those words, FAST saves all three words following each other ("to be or", "be or not", ...). This means three times the diskspace, but disk is cheap. This way, they have fast lookups, and save plenty of time.

    • The author, unfortunately, seems woefully ignorant of the advances made in the field of Natural Language processing. There are already a number of products available which analyzes Wire Service reports semantically to filter out relevant news items. Thus, the fact that e-mail or IM don't have links doesn't mean that they cannot be processed by computers.

      While it is true that someone trying to fool the NLP programs can easily do so ( by using coded phrases, etc. ). But human analyzers are not likely fare much better in those cases either.

      This is why privacy legislations similar to what's instituted in Eurpoe are still necessary in the US of A.
  • by Tim Ward ( 514198 ) on Sunday March 10, 2002 @09:14AM (#3137599) Homepage
    Now that Google will find anything you want so easily, isn't there a danger that people will stop putting links to useful and interesting sites on their pages?

    I don't need to tell people, via a link, about some wonderful site I've found if they can find it for themselves quicker and easier using Google. So I might not bother to maintain my collections of useful links, and Google will lose its information source. A victim of its own success.

    What happens then?
    • Why would it be quicker to load up google, do a search then click on the link (assuming its the correct one) as opposed to already being on your website and just clicking the link you've provided or just being provided with a link to a website?
    • Policemen don't have a job if there are no criminals. The job of a policeman is to arrest criminals to work exacly toward that goal. Police departments still seem pretty busy to me.
    • by GigsVT ( 208848 ) on Sunday March 10, 2002 @10:11AM (#3137701) Journal
      I've thought of this myself. I know I don't do nearly as much "surfing" between related sites now that Google is here and works. I usually hit Google up, then if that site isn't what I want, I don't bother clicking their links section, I just go straight back to Google.

      The one thing that may save us though is AOLers. Bear with me here. :) I think that maybe we have found the most efficient way to get the information we want, mostly because the novelty of the Internet has mostly worn off for us. We no longer spend hours bouncing from site to site, just reading random stuff. We use the Internet as a tool to expand our effective knowledge and intelligence.

      This is obvious with the various Googlebots that have sprung up in lots of IRC chat rooms. This happens a lot in help rooms, if no one knows the answer, or doesn't want to take the time to explain it fully, they just !google and the bot returns the first link in the search.

      So while people like us, if we were the only people on the net, would cause Google to fail, so long as there are still "surfers" out there, it should allow Google to remain meaningful.

      Just my two cents.
    • Then, it becomes harder to find things using google, and people start giving each other links again, and google gets better again.... see the cycle? I expect that there would be some sort of damping effect on this oscillation, so it would all even out in the end, with google being just short of good enough to warrant using it instead of passing links manually.
    • as i mentioned in a previous post, Google ranks pages in a recursive way. An important page is one that is pointed to by a lot of other important pages. so the flaws in your argument are:
      1. Your page is probably not all that important. unless you actually have important information on your page which most web pages dont.
      2. If you dont put any links on it then it becomes even less important, because now it is not even a hub.
      and to a lesser extent
      3. what you are suggesting is against the nature of the web in general. web page authors dont supply links only because it is hard to find links otherwise. they supply links as part of the text, for example when quoting or they supply links because they think those specific links are important and not others etc.

      in short the scenario you describe is both unlikely and not as catastrophic as you think it is.

      limbo.
      • Just a couple of points:

        (1) My page is important to a lot of people as it does have important information which is not available anywhere else.

        (3) People are supplying fewer links already in email. How often do you email someone a long complicated URL these days, and how often do you now email them "Google for xxx yyy / I'm Feeling Lucky" - quicker and easier to type and read? I haven't seen many Google search strings replacing links directly on web pages yet, but who knows?
    • In addition, what about all these /. links to google searches? Does google have a check in its programming to find links to itself? If not, as more and more people link to google searches, google could convince itself that it is the most authoritative site for any and every subject. I dunno about you, but I would find this very entertaining...

    • A change has just happened at Google: they are now tracking all off-site links (they used to only track off site links to advertisers). Where you used to get a link like this:

      http://www.some.site.com/foo/bar

      You now get a link like this:

      http://www.google.com/url?sa=U&start=3&q= http://www.some.site.com/foo/bar &e=code

      Now they *could* be (and knowing google, probably are) using this to improve the quality of searches, by watching how many links a person takes on a specific query, and assume when they stop, they found what they were looking for, and rate the followed links higher next time a similar search occurs

      But perhaps something more sinister is at work. This information could be of great value to direct marketers and police agencies. Google now not only knows your IP address and your browser type, they now know where you are going.

      Arguably, Google has the highest quality search results, and they have operated for at least 2 years without advertisements, solely on venture capital (and it must have cost a fortune for the hardware, and all those PhDs). Now they have us all hooked, they begin tracking our movements.

      Makes you wonder where all that startup funding came from and what revenue sources will contribute to the payback...


      • Huh? Now links to offsite locations are normal again. Looks like something they were playing with (at Mar 11 2002, 0058 UTC, restored by 0100 UTC)

      • Google has had ads for years. They have been off to the side in little pastel-tinted boxes. The great thing is you never even notice them until one of them is useful -- unlike all the damn popups everywhere, they stay out of the way until you need them. As such, I will willingly click on a Google ad if it relates to what I want (and it usually does!), while popups are killed via Mozilla, or, failing that, immediately destroyed as soon as they come up. Interesting to note that while everyone else seems to take the idea of "ads should obscure content", Google has taken the rational and sane approach of "ads should be relevant content".
    • As long as there are blogs [eatonweb.com], I think Google will have plenty of links to work from.

      They may also start factoring in "The number of times people used a link on google" into the equation to make up for fewer links to work from.
    • Google has also started something new to help with this. They now have a toolbar that you can download, allowing you to google with out actually going to their site. The hook is, when you download the tool bar, you have the option of having every page request you make being sent to Google and archived for page ranking. They say there is no personally identifiable info, the link just gets copied to google....

  • How to abuse Google (Score:5, Informative)

    by AftanGustur ( 7715 ) on Sunday March 10, 2002 @09:14AM (#3137600) Homepage
    • Well, this has been known for a long time. But really, it's not as big a deal as one might think. "Scientology" as a search term pulls up an entire page of Scinetologist sites, except for #4, which is xenu.net. However, the first page for "Scientology secrets" is full of sites that debunk Scientology. So yes, the Church of Scientology has a virtual monopoly on the search "Scientology" but is far, far from controlling other search items. It all works out in the end.


      :Peter

    • I'd not call it "abuse". It's simply that more pages (by real and virtual people) link to "real" scientology pages. After all, the COS is the source of information about scientology, don't you think? Telling this is the only job of google.

      The same way, when you search for microsoft, you don't expect linux.org to come out at the top, and vice versa. In the COS case, the picture has more shades, obviously, but any serious research should be done not only on the first link. You can help the opponents by linking Scientology [xenu.net] to xenu.net this way on all the pages you maintain, after all.

    • A search for Christianity [google.com] gives zero hits for anti-Christian sites. Is that rigged too?

      Same for Islam [google.com].

      • I doubt that's the result of an intentional block. It's probably because the search engine assumes that you wont want anything anti- to what you're looking for.
    • Slightly OT, but the Register has an article [theregister.co.uk] about using Google as an "attack engine".
    • The basic design of the Google cluster unfortunately lends itself to this kind of exlusion in the linking moreso than other search engines or entities containing linking mechanisms, but, this is not neccessarily a bad thing.

      The cluster receives the client request and reverse-NATs a reply based on an advanced TLU setting, which weighs variables against cached requests linked to the hashed lists of previous search requests items and returns. The problem comes in when each node of the cluster contests the cache servers for permission to send info back to the python code in the back-end web server.

      Often, permission is given to two nodes on the server or more, and this causes a problem in that the same info is sent over and over, causing linking problems after the python code is processed and spits out the HTML to the front end web server. This was the only way to do it and still keep Google's unique search features.

      • The cluster receives the client request and reverse-NATs a reply based on an advanced TLU setting, which weighs variables against cached requests linked to the hashed lists of previous search requests items and returns. The problem comes in when each node of the cluster contests the cache servers for permission to send info back to the python code in the back-end web server.

        This is slightly inaccurate and misleading. The truth is that the 4-way database clusters an array of search requests based on a dynamic SQL query.

        Just a head's up.

    • What does this article show, if anything? That a simple search for "scientology" [google.com] is dominated by official CoS sites? So what? That would be true for any organization, never mind one that owns a zillion different domain names. You still break out if you add even a single keyword [google.com]. Or if you skip the first few pages of results [google.com].

      It seems to me that those 50 or so "official" hits are not a result of a deliberate attempt to dominate Google results. They're just a symptom of the way Scientologists -- like any other religious zealots -- love to blather about themselves.

    • Do a google search on 'crucial facts,' skip the first ~5 results, and most of the results beyond that point are just search-spam.

      Some dork has registered a bunch of domains and created pages titled "crucial facts about [keywords]" with meta refresh tags to transport you to his/her/its web-based storefront for unrelated trinkets (or just-barely kinda-vaguely-sounds-related trinkets).

      I stumbled on this while searching for motorcycle clothing, but judging by the "crucial facts" result set, there's hundreds of these little spammer droppings in the google database, just from this spammer alone.

    • Whois information for the vast majority of these indicate identical registrations such as this one for exactscientology.net.

      This suggests a rather obvious patch for Google's algorithm, no?

  • Won't all these "good" things about the net just get screwed up with the .Net age almost upon us? Seems like embrace and extend will cause a LOT of PROBLEMS with a simple yet elegant solution to searching for what you really want.
    • Personally I think I will cope fine without .net. I won't use it, I won't develop for it, I won't encourage it. There's simply no need for it IMHO. Sounds like a lot of nice interesting little ideas (well ok maybe not so little) which are 'useful' in theory but they won't change the world. Remember Microsoft Bob? Weren't we all supposed to be surfing the net in virtual reality by now?

      Hopefully time will prove me right on this one, but I doubt google will take .net to heart.

      Come on, the net has coped fine with the current system for the last 15 - 30 years, I really don't think .net is that important.
    • That is a very interesting point. If you check out the Semantic Web [w3.org] activity there is a move to semantic definitions . DAML + OIL and several other efforts are all looking at defining the spoken/written language for computers.

      I wonder if the number 1 ranked page will always end up being a single document - the ontology [w3.org].?

  • Where's the magic? (Score:2, Insightful)

    by guerby ( 49204 )

    In the age of DMCA, SSSCA, and angelic companies running after all those evil pirates in order to protect their beloved authors that deserve their protection, how comes no one has yet sued the biggest copyright infringer of all times ... the Google cache?

    So where's the magic?

    --
    Laurent Guerby <guerby@acm.org>

    • The short version: The DMCA makes provisions for certain caches used in the transmission of information, such as your ISP may use. There are certain defined procedures that the ISP must implement to allow people to get their content out of that cache.

      Google implements those procedures [google.com], and claims protection under the DMCA for their cache. (Note the hoops you must jump through to get them to remove stuff are the legally mandated hoops under the DMCA; they are not trying to be nasty.) Now, a careful reading of the DMCA will show that Google probably doesn't meet the qualifications of this cache exception; but nobody has cared enough to fight it yet. The few who care just jump through the hoops and forget about it.

      The long version is: Read the DMCA and compare against Google's DMCA page [google.com] and decide for yourself.
  • No human decisions ? (Score:2, Interesting)

    by EpsCylonB ( 307640 )
    Do you think that the google search could be improved by more human decisions ?.

    An example might be that goat.ce page (or whatever the url is) might get linked to a lot as example of bad taste (I seen a few pages that link to it and describe the page urging people not to visit it), which fine except that this web site is now getting linked to (or voted for which is how the google algorithm treats a link) yet it isn't a particularly good or informative website.

    Even if someone was searching for something on bad taste, that page is not really an authoritive page about bad taste just an example of it.
    • (* Even if someone was searching for something on bad taste, that page [goat.ce] is not really an authoritive page about bad taste just an example of it. *)

      Sometimes a picture is, unfortunately, worth a million authoritative words.
    • But what human? A human decision based model would only make abuse easier and cheaper. Look at the criticisms of the Open Directory Project linked to at this post [slashdot.org]. The Church of Scientology easily abuses that human based system, while abuse of Google is more difficult and especially costly. Check the other posts on that thread too.

      Not to mention the added cost of hiring Google editors.
    • The moral of the story here is that whenever you come across a truly repugnant site [whatever.net], the last thing you should to is link to it. [whatever.net] I mean come on... "hey, this site [whatever.net] really sucks" will only increase traffic to that site.

      So if you really think a site sucks, don't link to it. [whatever.net]

  • advanced search (Score:1, Interesting)

    by Anonymous Coward
    "when searching was a spew of boolean mumbo-jumbo"

    I still use AND, OR, and NOT ("-" in google)
  • More Google Links (Score:5, Informative)

    by Schwarzchild ( 225794 ) on Sunday March 10, 2002 @09:23AM (#3137615)
    How Google Works [lycos.com]

    Undocumented Google Commands [researchbuzz.com]

    Google Time Bombs [corante.com]

    Google Science-Fiction [ftrain.com]

  • I do find Google good, but dont like people telling me all the time that is the ultimate search engine. People used to say that about Yahoo/Altavista before Google came along. And ow look where they are.
    One thing that really jars me is that when I search for my name on Google, I find more links to amazon given to my own home page.

  • Google is brillient if you know what you are looking for. It finds the best pages straight away.
    However, when I'm idely surfing (tm) I use something else.... I want to wander around the 'net not be taken straight to my destination.

    Bit like driving somewhare along the back roads. You never know what you might find

  • In a world of degradable storage, replicating copies is the surest way to guarantee longevity. Whether your data is in atoms or bits, the more copies you make of it and the more widely you disperse it, the greater the likelihood that your data will persist forever. (That's why Jaron Lanier jokingly proposed encoding printed matter into the DNA of the notoriously prolific cockroach [nytimes.com], as a means of ensuring archives through a nuclear war and beyond.)

    I can see some future biologist doing the the heavy work on decoding this now. And the arguments. of course, if it contained something like the Linux kernel, figuring it out could take awhile.

    Heck I am still waiting for folks to find a licensing and copyright statement in the human genome.

    ;-)

    • Heck I am still waiting for folks to find a licensing and copyright statement in the human genome.

      Is anyone looking?

      Seriously: we can get the raw data, right? Has there been any concerted effort to find any meaning in DNA at all other than the blueprint for life? We've known about mother nature's most reliable data store for decades, now. Are we sure, yet, that the complete works of the great society of 10^n years ago are not just waiting to be found?

  • by limbop ( 201955 ) on Sunday March 10, 2002 @09:36AM (#3137642)
    Google works on the recursive principle that an important document is one linked to by a lot of important documents. search for "child pornography" and (i'm generalizing here) you're likely to find two kinds of sites: sites offering child pornography and sites opposing it. those will probably create two seperate cliques (if you look at the web as a graph) or clusters. It will be quite easy to offer them as two seperate lists both satisfying the search query. i believe northern light (http://www.northernlight.com/) does exactly this.

    Now how about a similar principle for people? A suspicious person is one who communicates with suspicious people. If you have access to Email messages sent on the internet this is quite easy to achieve. Filter the messages to those mentioning "child pornography" and now do the same analysis as google does. voila! you are left with lists of child pornographers and of internet vigilantes. easy. automatic. you can start worrying again.

    btw, if you are looking for an interesting technical description of the best search engine around, the original google article (http://citeseer.nj.nec.com/brin98anatomy.html) by Brin and Page does the job a lot better than Doctrow's.
    • It is a little too simplistic, but it is totally feasible.

      What about when a vigilante emails a bunch of sites flaming them and telling to take their stuff down?

      This happens a lot in the spam/antispam world, antispammers probably trade more email with spammers than other antispammers.
      • I was using an icredibly simplistic example on purpose. of course this process cannot be totally automated (if it were we'd have inteligent agents retrieving information for us) but it can be brought to a point where the amount of information can be handled by a human without wasting to much time and with high precision.
  • nothing new... (Score:2, Informative)

    by illaqueate ( 416118 )
    Vannevar Bush, As We May Think [theatlantic.com] (July 1945)

    Ben Schneiderman, Codex, Memex, Genex [umd.edu] (December 1997)

    Henry Jenkins, Information Cosmos [technologyreview.com] (April 2001)
  • by XDG ( 39932 ) on Sunday March 10, 2002 @09:37AM (#3137647) Journal
    The article boils down more or less to the following:

    1. "Old" search technologies (Altavista, Yahoo) failed because they used approaches that found words but not content (Altavista) or relied on non-scalable human editorial judgement (Yahoo).

    2. Google works (and is cool) because it uses available information about the number of links to determine (a) valuable content and (b) smart judges of other valuable content

    3. The government efforts at creating the Panopticon will fail because they'll be stuck using "old" keyword approaches that can't pick out real content.

    This argument is flawed in two key ways:

    1. The author confuses the nature of the "search". Web searching is about finding *content* and the challenge is differentiating "good" content from "bad" content. Governmental "security" searching is more akin to traffic analysis and the goal is identifying dangerous *individuals* based on the content and pattern of their traffic. The challenge there is differentiating "good" (safe) speakers from "bad" (dangerous) speakers.

    2. The author assumes (based apparently simply on opinion and what is popularly reported in the press) that the government will blindly apply "alta-vista style" techniques. His lack of fear of the Panopticon is based on an assumption of incompetence in the application of surveillance methods. Given the motivation and resources (both of which the government now has in spades), there is no reason to believe that more sophisticated and effective techniques will not be developed and pursued. Assuming Echelon has really been in operation, it's hard to imagine that, in the closed halls of the NSA, researchers aren't well aware of the limitations of keyword search and are far along applying cryptanalytical techniques to the real problem identified above.

    It would seem that the author is trying to take advantage of hype and concern about government surveillance not to make a serious comment about it or whether one should truly be concerned, but rather to get an audience for his opinion that Google is really cool, which most of already knew anyway.

    -XDG
    • the challenge is differentiating "good" content from "bad" content. ... The challenge there is differentiating "good" (safe) speakers from "bad" (dangerous) speakers.

      I agree with all else you say - including that the government has the resources to come up with new approaches to the problem - but I don't think that this challenge is really different from distinguishing between good and bad content. In so far as the government is trying to do what it shouldn't even remotely be doing, using this technology to identify subvsersives, you are right. However, in so far as carnivore might *actually* be used to intercept a criminal communique, I think that the challenge is very similar to what is faced by google.

      Suppose that Inoccuous260@hotmail.com only ever sends one message, from some terminal in a public library, and it is the delivery schedule for a nuclear weapon. The best, most morally (if not legally) defensible use of Carnivore would be to intercept this message and hand it over to the Feds. If the Feds can do this, even once, Carnivore will be with us forever, however else it may be abused, b/c you will never rally the public will to end use of such a tool. The problem of identifying that message, and I don't want to brainstorm ideas here, but I'm sure we could come up with several, is very similar to the problem of picking out a biographical sketch of Allen Turing among all the sci-fi and hoopla, which Google can do using characterisation by links, and which the government would be hard-pressed to do without that human resource.

      So, the author raises a fair point about the limitations on the "legitimate", let us say intended, use of carnivore. However, the unintended/illegitimate use, simple identification of dissidents, could indeed be carried out by a clever 10 year old, and is plenty worrisome even if Carnivore never does what it was supposedly intended to do.
  • Wrong about email (Score:5, Informative)

    by Karellen ( 104380 ) on Sunday March 10, 2002 @09:48AM (#3137667) Homepage
    He's wrong about one thing. Email does have links. It has links indicating who it came from and who it went to. Even without the content, that sort of information, about who is talking to whom, and in what patterns, can be really informative to those who know what they're looking for.

    If you include the content, it's a goldmine.

    URLs embedded in email would make it better again

    Aside from that though, great article.

    • Privacy concerns aside, if the google technique was applied to emails in the same manner, spam and pornography would be more prominent than any relevant info on many search pages. The sheer volume of this would tip google-style search results. I'm sure the spammers would love this, sending extra, no cost(to them) copies of spam to everyone at the NSA :)
      • I'm sure that UBE would be easily identifiable by a google type of database as practically no mail will exist that goes _back_ to the source.

        Filters based on that (to either look for UCE, or to discard it) would probably be trivial based on ratios of sent/received messages to/from a particular envelope.

  • Most of the article is just stating the obvious.

    That last bit about our shadowy overlords, though, that's interesting, and probably the only insightful bit. Although I wouldn't mind a better explaination of why they must use an alta-vista-ish approach.
  • Wrong panopticon (Score:5, Insightful)

    by dallen ( 11400 ) on Sunday March 10, 2002 @10:18AM (#3137713) Homepage Journal
    Doctorow's point, I believe, is that we have a luxury of choices for searching information, but those who want to wiretap us do not have the luxury of infinite time and infinitely improved ways to find the information they want.

    If they could only track us via the public internet, I would probably agree.

    I would say we don't know what sort of technology they ultimately have for searching our data; until we knew that, we should not assume anything such as he has, that they're not able to keep up with the flood of data.

    Remember that they're not only recording elements of email, phone, and other communications; but they are also tracking who is sending and receiving it; and those who are under "wiretap" are nearly perfectly trackable as long as they can associate an identity to an IP to a person. That is the Panopticon, the prison with ideal survailance; mapping a person to their communication and selectively watching those who bear suspicion.

  • Media type (disk, tape, etc) and size of data is the least of the issues with storing data for long periods of time.

    Think of the Library of Congress who want to be able to store data forever. Let's think just 50 years from now. Even if they had the appropriate hardware, do you think they would have a copy of Microsoft Word 2000 handy? MS sure as hell won't be for sale and won't be supported. Would it run on any of the hardware available in 2052?

    "Oh yeah. There was this guy called Shakespeare who was supposed to be pretty good, but we just can't get to any of his works anymore".

    And ASCII?? That's (largely) fine for English/European, but there are other languages out there that can't be represented in ASCII at all.

    • Think of the Library of Congress who want to be able to store data forever. Let's think just 50 years from now. Even if they had the appropriate hardware, do you think they would have a copy of Microsoft Word 2000 handy? MS sure as hell won't be for sale and won't be supported. Would it run on any of the hardware available in 2052?

      Exactly...that's why we need open data formats [osopinion.com] for everyone.

      - adam

  • by AdamBa ( 64128 ) on Sunday March 10, 2002 @11:01AM (#3137801) Homepage
    1) Google sucks. All search engines suck right now. Altavista may suck 99% and Google may only suck 97%, but they are all terrible, and will remain so until they can actually start to understand what a page is about. The author may bag on AI, and it it bad now, but it's the only hope for workable search engines in the future.

    2) What is this absolute crapola about how bytes are more reliable than allegedly "fragile" books? Does this tubesteak realize that there are 500 year old books that are completely legible, while 15-year-old electronic data is unreadable? Yeesh. The only bright spot is that this guy's ravings are in electronic form, so future generations won't have to worry about them.

    - adam

    • I dissagree google does not suck 97%, it deffinatly finds what I want very often. I suggest you give google another chance.
    • The author may bag on AI, and it it bad now, but it's the only hope for workable search engines in the future.

      He doesn't exactly bag on AI, he just says we should let computers do what they're good at (Repetitive counting and sifting through masses of information) and let humans do what they're good at (Making judgments on how good or useful a web site is).

      2) What is this absolute crapola about how bytes are more reliable than allegedly "fragile" books? Does this tubesteak realize that there are 500 year old books that are completely legible, while 15-year-old electronic data is unreadable? Yeesh. The only bright spot is that this guy's ravings are in electronic form, so future generations won't have to worry about them.

      Yeah, 500 year old books are readable, if they're kept in vacuum sealed boxes and not touched by human hands. I have copies of books that are falling apart after a couple of years. And if you had read the whole article, you would have read him say "CDs, magnetic tape, flash, and platters all fall apart pretty quickly -- but that's OK, because bytes are not only comparatively tiny ... but they get tinier every year." Yeah, CDs only last about 15 years, but in 10 years you'll be able to fit your 1000 CD library on 1 SuperduperCD. You can easily make exact copies of bytes, but I'd like to see you make those copies given the 1,000,000 books those 1000 CDs can hold.

      • Maybe 500 was an exaggeration (given that the printing press was about that old)...but there are certainly 300 year-old books that are fine (not having been vacuum-sealed) and 100 year-old books are not even that unusual.

        The article (or that part of it) reminds me of the people who claimed that newspapers were going to fall apart and they all needed to be microfilmed and stored that way...now the newspapers that were dumped are in such great shape that The Sharper Image is selling them for $30 a pop, and the microfilms are deteroriating, that is the ones that were made legible to begin with.

        Copying bytes may be easy but every time I switch computers I have to worry about moving stuff and where is it stored, then there is 20-year-old stuff on 5 1/4" floppies...meanwhile my books from childhood are all doing great. Even the cheap-o dot-matrix printouts from my BBS days in 1983 are perfectly preserved, which is more than I can say for any data I had from back then.

        - adam

        • Maybe 500 was an exaggeration (given that the printing press was about that old)...
          Actually, there are books that pre-date the printing press. The oldest printed book still around is The Diamond Sutra, at The British Library [www.bl.uk]. It dates from 868AD.

          It may also be the oldest existing Open Source document:

          The colophon, at the inner end, reads: `Reverently [caused to be] made for universal free distribution by Wang Jie...
          :-)
  • > Then they must use some hybrid approach: human editors and AI

    Well, there's the implied assumption here that the people running this surveillance operate with standard hardware, where standard means something google, altavista, lycos, etc. can get their hands on. Sketchy information [echelonwatch.org] suggests that they do not; specialised hardware seems to be the order of the day.

    Besides, there's a lot of research going on in terms of context recognition, here [xerox.com] to name one place.

  • Teoma often provides better results than google.
  • by Bobzibub ( 20561 ) on Sunday March 10, 2002 @12:06PM (#3138019)
    This article is insightful? It is deceiving. I read something interesting about the "Panopticon" not long ago...
    "The agency which Poindexter will run is called the Information Awareness Office. You want to know what that is? Think, Big Brother is Watching You. IAO will supply federal officials with 'instant' analysis on what is being written on email and said on phones all over the US. Domestic espionage."

    --John Sutherland of UK's Guardian.

    Remember John Poindexter? Mr. Iran-Contra? He lied to Congress and kept Ronald out of the loop. He also was responsible for shredding lots of docs on the subject as well. Now he'll be spying on US domestic electronic transmissions.

    There is some irony in him destroying thousands emails to cover his ass then and now being in charge of watching everyone else's emails.

    I'm also sure that the billions of dollars for his new office may be able to overcome shortcomings of certain search engines. Nobody's going to have to type all those boolean operators.

    The quote above is from the UK's Guardian... Check out what you might have been missing [guardian.co.uk]

    An interesting story, curiously not in CNN.. [looksmart.com]

    Nor MSNBC... [msn.com]

    Couldn't find it in Washington Post..

    Article in LA times on his appointment does not describe what he is to do in his new job except to blather about Sputnik and stealth aircraft. [latimes.com]

    Not in CBC.ca : (

    Cheers to all the spooks! I think it is a job well done! -b.
  • I like Google; it weeds out most of the spam -- unlike AltaVista. It isn't perfect, though. I once searched for prostate milking [google.com], after reading this [memepool.com]. The search results were quite interesting: It brought up hundreds of, apparently fake, headlines ("Located here! Prostate Milking") and domain names ("childhood-disease.accurate-health.com/prostate-m ilking.html"); it in fact still does, even though a month has passed since. Many of the links don't work, but some [health-1nf0-care.com] redirect you to other sites [healthandage.com] (this one amazingly owned by Novartis [novartis.com], a supposedly "respectable" biotechnology company). Question: How do they do this?
  • by Anonymous Coward
    "I hate it how everything is being cached and observed and indexed, but I love it cuz its cool!!!"
  • The folks that run Google(tm) have decided to censor advertising of perfectly legal articles, and to enforcing a ban on advertising by anyone who sells knives, firearms, or related items, whether or not those items are featured in the actual advertisement. See [bowmansbrigade.com]
    this link for details.

    • So Google is even better than I thought it was!
    • The Google search engine company, refuses to advertise businesses related to the gun or knife industry

      Why is this news? Because gun and knife owners are being discriminated against. Just imagine a storeowner who posts a sign on his door saying "No Firearms Allowed". While still open to the public, anybody can walk in and shop for products. However, the store owner is saying to you, the gun owner, that while your business and your money is welcome here, your firearms are not.


      Oh I love it.

      [BEGIN BROADSWEEPING GENERALIZATION]
      BTW, in the real world, this is what most people would EXPECT.

      Perhaps the realworld doesn't quite extend to the US, I don't know.

      BUT in the RoTW a shop owner can easily decide to put up a sign which says "No Firearms Allowed" and expect it to be respected... But what's more, they don't have to... BECAUSE THAT'S THE DEFAULT!

      So, I have no problem with the shop owner deciding to ban firearms... its 'his shop' he can do whatever he wants, if you don't like it, don't give him your money. Your loss.

      Meanwhile, I'm quite pleased that Google refuses to accept money to host gun and gun part ads.

      Go Google :)

      Gun Owners can be a funny bunch... this one uses some very nasty, angry, retalitory and confrontational language...

      Funny Americans.
      [END BROADSWEEPING GENERALIZATION]
  • Haven't seen a link to this yet. The CIA is funding new search technologies via In-Q-Tel [in-q-tel.com] From their page:

    In-Q-Tel is an independant, private, non-profit company funded by the U.S government with one objective:to identify and deliver next generation information technologies to support CIA's critical intelligence missions.

    I wonder if they like soda?(Hi Cory!)

  • I have a couple of interesting comments regarding searching and XP:

    1) TweakUI, part of the XP Powertoys released, then later unreleased, has a parser for IE. It enables me to search from the Address bar using only a single letter to designate where I want to search. Thus, when I want to search google I type: "g [insert search terms]". Here are some of the URLs, (these should NOT be hyperlinks):
    d - http://www.dictionary.com/cgi-bin/dict.pl?term=%s
    g - http://www.google.com/search?q=%s
    t - http://www.thesaurus.com/cgi-bin/search?config=rog et&words=%s
    y - http://search.yahoo.com/bin/search?p=%s

    2) Whenever I screw up typing in the address bar (i.e., whenever I forget to type the 'g' or 'd'), an MSN search page gets pulled up. Of course you can disable this searching from the address bar in the options menu. But if you screw up typing again, the option automatically turns on and pushes you further into M$-land. IE 6.0
    sp?
  • Is link/reference indexing an exclusive patent of Google? It is a bit hard to believe that they are the first to do this. Books have had references in them for mellenia. Nobody bothered to EVER index those until Google? (And write about it.) I find that a little hard to believe.

    The last thing we need is an e-nopoly on search engines.

  • For a second, I thought it was an article about loving me. I know I can be difficult and high-maintenance, but it can be done, I swear.

  • The grim era before Google, when searching was a spew of boolean mumbo-jumbo, NEAR this, NOT that, AND the other?

    I kind of liked the "NEAR" operator - wish google had it!

  • Paul Ford wrote a hilarious piece [ftrain.com] on what life might be like if google tried to index the world.

    Me, I think that the reason that the Harry Potter film ended up looking uncannily like what was in everybody's head is because Google can index the brain [canncentral.org].

    Just a theory.

  • by fwc ( 168330 ) on Monday March 11, 2002 @12:17AM (#3141167)
    I was talking to a friend about "mystery email attachments", and wanted to find this user friendly strip [userfriendly.org].

    So, without thinking I fire up google and type the search:

    "user friendly the comic strip" email attachment

    and then clicked on search. The first hit is the cartoon I wanted, so I click on it. When I pull up the page, I realize that the text words "email attachment" don't appear anywhere on the screen other than the graphic text in the comic itself, so google shouldn't have found the page - at least according to how I thought google worked. So I pulled up the source to see if there was a meta tag there which would explain this. Nope.

    The only thing I can think of is that google either OCR's the pictures (seems scary, and that font which Illiad uses doesn't look very OCR-able). The other thing I thought about is that perhaps google also matches text found within <A> tags which link to that page or something.

    I've shot a message off to google to ask about this but I haven't heard back yet. I'll be interested to find out how the *@(#*$ they did this.

    I think that I saw an ad somewhere which said "How the @(#$* did they do that?" was the highest praise one web designer could give to another. If that's true, they've definately earned my praise in this case. Regardless, some wizard at google got their search engine to do exactly what I wanted with whatever technology they used. Technology sufficiently advanced is indistinguishable from magic. And google is definately magic.

As you will see, I told them, in no uncertain terms, to see Figure one. -- Dave "First Strike" Pare

Working...