Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
The Internet

Google Expanding To IRC? 208

AnimeFreak writes "In this The Register article, Google apparently has been involved in a little bit of activity in various IRC channels. According to Google, as asked by IRC Junkie: they're researching ways to improve their service and the activity is only temporary. Could this mean an ability to search for information that is contained on IRC? Services, such as Netsplit.de and Search IRC exist, and both allow the ability to get information from various IRC networks. Is Google trying to replicate what both these sites have done?"
This discussion has been archived. No new comments can be posted.

Google Expanding To IRC?

Comments Filter:
  • Terms. (Score:5, Funny)

    by saintlupus ( 227599 ) on Tuesday November 11, 2003 @09:52AM (#7443323)
    "Search for w4r3z complete. Results 1-10 of eleventy billion:"

    --saint
    • "Search for 'Please packet *.*.*.*' complete. Results 1-10 of eleventy billion billion."

    • Actually... (Score:2, Insightful)

      The idea of searchable IRC logs kindof scares me. An investigative team need only go to Google to search for discussions by someone with the nickname "l33t".

      Of course, IRC logs are already out there, often made available by the denizens in charge of the channel in question. But they're not hooked up to a common database.

      The speed of information dissemination is great for research and development, but that applies to both you, and people who want to learn about you.

      I've mentioned several times on IRC th
      • Isn't it more likely that Google are simply trying to index the general topic of each IRC channel, rather than the specific content? For example, a search for "linux" on "Google IRC" might return #opensource as well as #linux. (Disclaimer: No idea if #opensource actually exists.)
        • Isn't it more likely that Google are simply trying to index the general topic of each IRC channel, rather than the specific content?

          But IRC channels are pretty ephemeral - IRC channel topics, even more so. Google runs a serious risk of adding a bunch of dead or irrelevant links to their database.

          Additionally, what would such a link do? The vast majority of Google visitors do not have an IRC client loaded on their computers - especially not one that plugs into their browser. So most users who get a link

          • Perhaps they will have a system that rank the IRCs according to their activity? Kinda like page rank, but instead of links, it uses # of users. And maybe they'll have a java client in Google that allows you to chat in it. On the other hand, google also returns result in Adobe Acrobat format, which some computers might not have the right software to run it. Maybe they would team up with mIRC? Now that would be cool. Or be Mozilla friendly and use Mozilla's own IRC chat client.
            • There's tons of Java IRC applets out there, ready to be customized to whatever specifications Google needs. I don't think they would need [or that they should] rely on Chatzilla or mIRC plugins.
    • Yeah. It's called Packetnews [packetnews.com] or IRCSpy [ircspy.com].
    • Re:Terms. (Score:3, Interesting)

      by NightSpots ( 682462 )
      Wouldn't it also be nice for google to have an IRC interface to their search engine?

      Google bots in popular channels. It could work.
  • and believe it or not it's called xgoogle.com
  • Concerned (Score:4, Insightful)

    by FreeLinux ( 555387 ) on Tuesday November 11, 2003 @09:57AM (#7443352)
    The "information" on IRC is 99% crap. I'm concerned that, by integrating IRC searching in Google, the signal to noise ratio of Google will go way down. If however, Google keeps it as a separate service like Usenet I suspect that it will go away due to lack of interest.

    Who really wants to search IRC, except the Justice Department?
    • "The "information" on IRC is 99% crap"

      Come on now, don't you really need a search engine to find out about statements like " /mode -o ChanXBot" ?

      Googling minds want to know!
    • The "information" on IRC is 99% crap
      And the information on the WWW is 95% crap, so what's your point again?
    • Re:Concerned (Score:2, Insightful)

      by SEWilco ( 27983 )
      Who really wants to search IRC, except the Justice Department?

      The new guidelines, billed as a response to the September 11 terrorist attacks, permit the Bureau to engage in the "proactive collection of information on threats to the national security," displacing an older policy that obliged the FBI to have a specific investigative purpose before collecting information on individuals or groups. "FBI on look-out for foreign government hackers" [theregister.com]

      Government workers on IRC sounds like a good idea to me. Th

    • You would be right if IRC-originated info was searched along everything else; I suspect IRC search will be available in a seperate tab alla "Groups" or "Directory".

    • IRC is often used for community support for a product, much like Usenet. Debian used to have a really good IRC support channel, but it's since become hostile to new users.

      IRC is often a great place to ask obscure questions, where if you ask on Usenet, you're often a lone voice in the world.
    • Re:Concerned (Score:3, Interesting)

      by platypus ( 18156 )
      The advantage of IRC, though, compared to the Web, is that it is more reliable - in a very weak sense, but nonetheless.

      Think of the google page rank algorithm, it is in great danger to be made useless by link farms.

      That is because google has problems seperating link farms from "real" pages which link to each other and by that, provide each other some trust (pagerank).

      With well populated irc channels, googles bots can have a higher trust that these channels are not artifial, like the link farms are.

      Altho
      • Good theory, One of googles problems is finding the relevant link for search terms, this will help that.

        I imagine they would have to extend the processing to a few lines around the line containing the url to pick up the question or statement that prompted the url.

        Perhaps it would cause an increase in bots - to spam urls to increase their googlerank.
        • One of googles problems is finding the relevant link for search terms, this will help that.

          You can think this even further (or speculate even wilder)
          Webpages are connecteced to each other via links, i.e. you get some kind of directed graph.
          Now you can partition this graph, by trying to find whole subgraphs which don't get links from the outside (but may link to pages outside "their" subgraph) - this subgraph then probably would be a link farm, because no "respected" page would point people into a link far
    • A lot of people seem to be missing the point: It seems unlikely that Google would archive ALL channel logs. I mean it'd be really rather unfeasible technically. In order to join all 372628424682 or whatever channels on the largest networks, Google would need about 50,000 bots logged on at the same time. At this point, the nets could easily see the Google bots, and ban them for network abuse.

      More likely is that, as the summary suggests, Google may be trying to emulate the functionality of something like
    • They could improve their search engine by taking what they learn from IRC and removing anything similar from their web database.
  • by AtariAmarok ( 451306 ) on Tuesday November 11, 2003 @09:58AM (#7443357)
    2005 - Google indexes all the things ever said on soap operas and talk radio.

    2007 - Did you forget what you said in your high school cafeteria in 1998? Don't worry, Google now has it indexed.

    2010 - Lost your car keys? Don't worry, Google knows. Just do a search and you will find them.
  • searching the irc (Score:3, Insightful)

    by jlemmerer ( 242376 ) <xcom123 AT yahoo DOT com> on Tuesday November 11, 2003 @09:58AM (#7443361) Homepage
    Well, how do you build up a reliable irc database. I mean there are many servers and bots and so on in the irc, and most of them deal with warez and therefore are only up temporary. So if google really wants to build a irc search engine they have to find a way to get rid of the dead links, and also from links that point to illegal copy's (you can be sued for pointing to warez, can't you (see the deCss case)).
    I personally would be glad, for the irc is a little bit, well, unstructured, and a search engine would definitely do good, but the problems building a database and interface based thereon seem enomous to me.
    • Re:searching the irc (Score:4, Interesting)

      by That's Unpossible! ( 722232 ) * on Tuesday November 11, 2003 @10:09AM (#7443437)
      Well, how do you build up a reliable irc database.

      Have your bots sit in channels worth archiving. Break logs down into manageable chunks (hourly, by size, etc), and index them. Searches pull up these chunks of log with your search terms highlighted.

      I mean there are many servers and bots and so on in the irc, and most of them deal with warez and therefore are only up temporary. So if google really wants to build a irc search engine they have to find a way to get rid of the dead links, and also from links that point to illegal copy's

      Ever try searching for warez on Google Groups? Good luck. They don't archive the binary newsgroups, and it is simple to weed out the posts that contain binaries in regular newsgroups.

      Google is pretty smart, let's wait and see what they come up with.
    • by mrtroy ( 640746 )
      First off...I assume Google will only be using the major networks, which are permanent.

      Secondly, there are many servers, and bots...but how does this relate to an IRC database?

      Servers and bots dont talk much. And I would assume google would be ignorning all mode changes

      Next, IRC is not all about warez. Its the first GOOD chat system, and I still prefer it to any IM, hands down.

      And what the hell do you mean IRC is unstructured? There are networks, which have servers, which have channels and user
  • by presroi ( 657709 ) <neubau@presroi.de> on Tuesday November 11, 2003 @10:00AM (#7443371) Homepage
    Well, recalling from where I get "news" (read: 90% useless but funny content via links), the IRC (IRCnet, which is popular in Germany) is a incredible fast distribution way for links.

    Assuming that google is interested in finding new sites as soon as possible, they should crawl the irc channels.

    This does not mean that they are going to index it.
  • by Punchinello ( 303093 ) * on Tuesday November 11, 2003 @10:01AM (#7443374)

    It seems Tony Collen [manero.org] had the original scoop on this story. It is more informative than the Register link.

    If you scroll down his original web log [manero.org] on this topic you will see Google's first official acknowledgment of their IRC activity.

  • by tuffy ( 10202 ) on Tuesday November 11, 2003 @10:02AM (#7443380) Homepage Journal
    ...a/s/l?
  • does that mean... (Score:5, Interesting)

    by zr-rifle ( 677585 ) <zedr.zedr@com> on Tuesday November 11, 2003 @10:04AM (#7443397) Homepage
    that spam will extend itself to irc?
    Thousands if not millions of bogus irc channels with specific keywords inserted in the topic only to attract hits on the main google search page?
    • That's exactly my thought. It's not hard to imagine some bots joining channels and repeating their URL in order to get a better rank on google.
    • They already do this. Not just to get a rank - also to actually advertise sites.

      Every channel I've ever been on has an autokick set up to kick these bots every time they come on or as soon as they're identified.

      So google wouldn't have a huge problem because bots get kicked as soon as they something the channelops don't like.

      All google would have to do is look for a kick after an ad to know whether or not it is spam.
  • Already? (Score:3, Funny)

    by chendo ( 678767 ) on Tuesday November 11, 2003 @10:04AM (#7443406)
    XGoogle.ORG -> Error: Cannot Connect to Data Base
    Too many connections


    Slashdotted already? We slashdotters are more dangerous than a beowulf cluster of... something.
  • by WegianWarrior ( 649800 ) on Tuesday November 11, 2003 @10:05AM (#7443412) Journal

    How IRC users would react to a bot from microsoft.com is an exercise left to the reader.

    If the IRC is anything like was it was when I last brushed thru, not many will even notice - or attemt to engage the 'bots in "virtual intimate acts".

    Off course, there would always be someone - likely a Mac or Linux user - who will notice and scream up about how MicroSoft is 'spying' on the IRC-network, which in turn would lead to several more or less wellinformed blogs writting about it, which in turn will lead to a /. headline close to "Micro$oft trying to take over IRC, will shut out 3rd part clients"...

    • More curious is the fact that there are users pretending to be from microsoft.com.

      I know that there was one, pretended to be pc5215.redmond.corp.microsoft.com... He couldn't make up his mind whether he was a Windows apologist, a Mac admirer, a BSD zealot or a Linux flamer...

      In a way, it [having a certain nickname, or DNS address] is flamebaiting without even saying anything.
  • Bot vs. Bot (Score:5, Insightful)

    by matchlight ( 609707 ) * on Tuesday November 11, 2003 @10:07AM (#7443420)
    The IRC admins, at least for most of the better channels, will simply set up a config to kick/ban the google bot. Many channels don't allow non-human connections unless set up by the channel admins. Unlike the annoying spammers who uses legit and stolen access points, google will likely come from a single legit source making the process of denying access easier.

    Google shouldn't be trying to find more content, they should be working on filtering out the mass of garbage sites that already exist.
    • what happens when they work out a deal w/the IRCd and end up with the ChanServ doing the watching?

      I guess you could move to EFnet but w/EFnet as bad as it has been do you really want to do that?
    • dont have to use a bot..

      simply talk the IRC network into letting you run a node, and then all traffic in that irc network is now available to you unrestricted and harvestable.

    • Maybe they use IRC to find what out what garbage is, then de-page-rank any page that has the same garbage on it as the irc channels.
    • Normally, unauthorized bots are identified and kicked specifically because they're spammers - that is, the bots are designed to advertise a message to the rest of the channel.

      The general goal of any channel is self-preservation (among other things), which is invariably hampered by spambots, who annoy the regular attenders.

      AI bots that talk can be just as annoying, which is why channel ops like to control them - to ensure that they follow the purpose of the channel.

      Where do listening bots fit into this?
  • Google Labs does have to keep busy. I wonder what they're up to.
    • Identify authoritative IRC participants, their information and related web sites?
    • Identify stupid IRC participants, and reduce the importance of their information and related web sites?
    • IRC Rent-An-Expert Service?
    • GoogleNatter: The Bot that makes you sound authoritative.
    • GoogletyMooglety: The IRC filter that lets you hear only the good stuff.
  • Before Google, before WWW, there was...
    Archie - the first search engine [uiuc.edu]

  • ...are they INSANE?

    Oh, great - now everyone gets to see how many times I've k-lined stupid *.MY "h@x0r" wanna-be's for flooding my IRC Network's Admin channel with "N3TF0RC3 0WNZ J00" or remove their "Undetected" clone technology that acutally says "Netforce Undetected Clone Technology" in the userinfo.

    Wait - that might be usefull to show the other *.MY users that we didn't k-line their Class-C address space because we don't like them - just the abusers.

    What is this world coming to?

    ScottKin
  • p2p search (Score:2, Interesting)

    With the importance of Google in our every day lives steadily increasing, I don't dare to think of what might happen if Google et all stops being our good friend at some distant point. Centralized repositories are just not the way to go, we need a distributed, user-base owned, search engine. Maybe in the next Matrix moovie...

    • One of the most powerful weapons that the Eastern European regimes had against their citizens was the illusion that the state might know what you're saying. Everything you said _might_ have ended up in a nice little dossier somewhere, and come up to bite you in the ass later. Like at some point you _might_ have screwed up your promotion chances because you once said that the Party shouldn't control the industry. So most people preferred to just avoid politically sensitive issues, than have that stuff show u
      • by mregit ( 723281 )
        ChatScan. Feb, 2001.

        ChatScan was an Israeli enterprise that claimed 10 million in funding. They joined a bot to IRC channels. The bot broadcast live channel text to their website. The idea was, people could scan down a list of pre-selected channels, see which had interesting conversations, then go and join them - or just watch from the website.

        Users who found what they thought was private conversation up on the web were outraged. IRC channel owners and admins agree with you 100% - they considered this u

  • by *weasel ( 174362 ) on Tuesday November 11, 2003 @10:13AM (#7443458)
    like archiving email, usenet, and web traffic before it - this is simply a reminder that nothing you type through an open network is -private-. this is a lesson most of us should have learned a long time ago.

    but this isn't an invasion of privacy. there's no expectation of privacy when you log onto a public chat board. just as there's no expectation of privacy should you decide to walk naked through a park.

    the best you can hope for online is pseudonymity.
    but that's out the window with the combined power of google. which is quickly becoming the internet's inadvertant Big Brother.

    the primary difference being, google works -for- the people just as much as it works -against- the people.
    • You don't IRC, do you?

      Your conversation is pretty private if you DCC.

      'Course Carnivore can still get it, but I don't see a joint venture between Google and the FBI on the horizon.

      -Peter
    • Actually, while IRC rooms might be borderline, what they say in the article about making AIM conversations searchable is not. The AIM conversations were one-to-one talks, and letting everyone look through them is extremely bad taste.

      Even if it wasn't any "cybersex" or anything illegal involved, there might be little secrets that those people never wanted made public. E.g., even something as benign as that you once called in sick to stay home and play the newly released Diablo 2, you probably don't want pla
  • by wo1verin3 ( 473094 ) on Tuesday November 11, 2003 @10:14AM (#7443461) Homepage
    Bill Gates: Speak.

    Neo: The search engine Google has grown beyond your control. You cannot stop him -- but I can.

    Bill Gates: And if you fail?

    Neo: I won't.

    --- several scenes later ---

    Google: Mr. Anderson! Welcome back, we missed you.

    * Google pauses and looks around at the multitude of web sites and irc channels he has cached

    Google: Like what I've done with the place?

    Neo: It ends tonight.

    Google: I know it does, I've had some researched figure out the answer for me [google.com]. That's why the rest of me is just going to enjoy chatting on irc while we fight. I've seen the logs and irc'ers already know that I'm the one that beats you, so they're just gonna download from some leet xdcc bots.

  • GoogleBot69: guess what, i have one dick and 100 balls [google.com].

    GoogleBot70: me too!!
  • by jsse ( 254124 ) on Tuesday November 11, 2003 @10:15AM (#7443470) Homepage Journal
    Now we've new category of stuffs to search for other than p0rns. :)

    bloodninja: Ok baby, we got to hurry, I don't know how long I can keep it ready for you.
    j_gurli3: thats ok. ok i'm a japanese schoolgirl, what r u.
    bloodninja: A Rhinocerus. Well, hung like one, thats for sure.
    j_gurli3: haha, ok lets go.
    j_gurli3: i put my hand through ur hair, and kiss u on the neck.
    bloodninja: I stomp the ground, and snort, to alert you that you are in my breeding territory.
    j_gurli3: haha, ok, u know that turns me on.
    j_gurli3: i start unbuttoning ur shirt.
    bloodninja: Rhinoceruses don't wear shirts.
    j_gurli3: No, ur not really a Rhinocerus silly, it's just part of the game.
    bloodninja: Rhinoceruses don't play games. They f*cking charge your ass.
    j_gurli3: stop, cmon be serious.
    bloodninja: It doesn't get any more serious than a Rhinocerus about to charge your ass.
    bloodninja: I stomp my feet, the dust stirs around my tough skinned feet.
    j_gurli3: thats it.
    bloodninja: Nostrils flaring, I lower my head. My horn, like some phallic symbol of my potent virility, is the last thing you see as skulls collide and mine remains the victor. You are now a bloody red ragdoll suspended in the air on my mighty horn.
    bloodninja: Goddam am I hard now.


    (Original post from bash.org [bash.org]
  • The news implies that google is going to start indexing irc logs from channels everywhere. But I don't think this is what they're going for. I think they'll include something that allows you to search for irc channels. So if I am looking for a channel where I can ask a question about something, google will point me towards the right server and channel that I can get into.

    Also, they could be using IRC to facilitate google answers. Heck, if I was one of the google answer people you can sure bet I would u
  • I attended a talk by a couple of google guys at my school [wikipedia.org] (one of the speakers, Krishna Bharat, creator of google news, is an alumnus). Apparently they have a lot of expansion plans. They're planning to set up a new research center (at Bangalore) with around 300 to 500 people. So I'd say this isn't surprising in the light of the long term plans they have.
  • I attended a talk by a couple of google guys at my school [wikipedia.org] (one of the speakers, Krishna Bharat, creator of google news, is an alumnus). Apparently they have a lot of expansion plans. They're planning to set up a new research center (at Bangalore) with around 300 to 500 people. So I'd say this isn't surprising in the light of the long term plans they have.
  • by SageMadHatter ( 546701 ) on Tuesday November 11, 2003 @10:30AM (#7443567)
    *Goes into new google IRC search mechanism and searches for term "Warez"*

    Result: "Warez" is a very common word and was not included in your search

    Mad Hatter
  • On IRC, how do you set the equivalent of X-NoArchive, which Google does respect?

    Anyone that expects that someone won't collect and archive anything they do in a public forum is dreaming, but usually IRC log publishers get accused of breaking netiquette. Should we all add Lamie copyright notices to anything we do on the Internet? (Yes, yes, copyright inherent, stupid, I know. Tell Lamie that.)

  • by Karma Sucks ( 127136 ) on Tuesday November 11, 2003 @10:53AM (#7443785)
    For example, I would like to search and browse the chatter on the SUSE acquisition and KDE vs Ximian situation on #gnome @ irc.gimp.org.

    If Google could allow me to do that, that would be fantastic.

    As an aside, does anyone know of IRC logs for #gnome?
    • Knowing irc on the whole, any broad search across all channels would only return noise anyway. I can just see it: Search: "+Ximian +gnome" Top result: "Ximian: ne1 no how 2 do a garden gnome in The Sims?" "*** HornyGurl is now known as Gnome" "Gnome: yeah, do me." "Ximian: asl?" Search: "Linux shell back door" Top result: "LaraCroft shoves her big strap-on up Tux the Linux penguin's back door." "Shell: ROFLMAO!!!" (Continues in descriptive detail over 3 pages.)

  • This [bash.cx] is the only IRC index that matters.
  • I wonder what thats about... If they'd want to start logging & indexing IRC channel discussions, they'd either need some kind of deal with IRC server operators to get traffic from them, or just have their own googlebot on every IRC channel. The second option is quite hard: Most servers have a limitation on how many channels you can be on, for example at most of IRCnet its 11 channels I believe. And theres 46600 channels currently on IRCnet. They'd need 4237 connections open to get in all of those, the I
  • by hatless ( 8275 )
    I find it hard to believe Google really wants to index IRC. The occasional open-source developer discussion aside, it's a wasteland. My guess is that they're experimanting with indexing and archiving text chat in general with an eye toward indexing things like internal corporate chat for their intranet appliances and things like "celebrity" Q+A sessions for the public.

    IRC gets them a good data feed for experimenting since it's not burdened by corporate Terms of Service, has an open protocol, and has a good
  • The odd thing is that people are reporting the robot joining channels, doing /whois on users and more. What value could the /whois info from random users have? The only thing one can safely say about this whole situation is: Google is doing some testing on IRC. Personally, this is how I look at it: Google ranks websites according to many criteria. Ranging from keyword density, keywords, text placement on the page, to incoming links and what the text within the links say. What use could IRC have? It is p
  • This can prove to be very useful. If I have some problems doing something, or get an error in Linux I can search the linux-channels for a quick answer.

    A lot of people get support from IRC, and now it's possible to "do a google" on a channel before asking. People use IRC because of the instant feedback, and the ability to do real-time troubleshooting. Because of this a lot of questions that get answers on IRC, never gets published on the www/forums -- so different people ask the same question over and over
  • From the information I've seen [manero.org], Google is capturing URLs in channels, not the actual conversations.
  • to add chat rooms. It makes sense. AOL, MSN, Yahoo all have them.

    Reading through most of the comments, everyone thinks that Google is indexing IRC. It doesn't make sense. The amount of useful info is so small and short lived that it doesn't make sense. Bots, lurkers, filetraders. I admit that there are links to harvest. I think that if they can parse free text, they could start indexing topics, but then they run the same risk as indexed blogs. An impassioned minority (or majority) could sway an
  • we need a government funded, not ran, media outlet.

    Or

    we need the FCC to start giving money to the media outlets to run the News.

    It used to be that the FCC gave buckets of money to stations so they would have a news agency. traditionally a money losing situation for the stations. however once that money got pulled, stations needed to make money, so now we see all the fluff meaningless crap. meanwhile stories about political situation, and Iraq get buried cause the don't make money.

    Before peopel start ran
  • No proper Format (Score:2, Interesting)

    by kyndig ( 579355 )
    I have reviewed several logs of IRC chat rooms, and have not yet seen a good log format. Reading something like:

    klax: So what'd you eat for dinner
    bryan: Does anyone know how to recompile a kernel?
    ray: I had french fries and a beer

    Provides little to no format. Google currently cache's PDF files in their cache; and should your search term return a pdf file, all your keywords are highlighted. I would imagine that google would use this same approach for their log format system, yet even this does not provide

No amount of careful planning will ever replace dumb luck.

Working...