Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Google To Gain a Rival?

Posted by Hemos on Mon Jul 23, 2001 08:00 AM
from the this-is-the-house-that-yahoo-built dept.
markpapadakis writes "Seems like Google got itself a new rival, which seem to have the potential to actually challenge successfully our beloved'G'. hTeoma Technologia launched a beta version of its search engine which enhances the link analysis idea, borrowing some ideas from Google and extending it to recognise 'communities' of subjects."
+ -
story
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • by Anonymous Coward
    Thank God for that post-posting editing function, eh Hemos? ;) Within 10 minutes, the link to searchenginewatch.com [searchenginewatch.com] changed 4 times: teoma%20.html, toema.html, toema. html.. Oh well.
  • by Anonymous Coward
    I searched for my name, got a whole lot of page in French (naturally) but this searhc engine doesn't seems to know french.

    So it groups result under silly "Topics" like "Le" or "De" ( = "the" or "of")

    Too bad...
  • The index for the site appears to be based on a very old scan of the web. It finds the old web page for an organization for which I'm the web master, but does not find the new page at all in spite of the fact that the new page has been up since last October and that the old page has a direct link from there to the new page. In addition the new page is referenced by lots of "meta-links" pages that list similar organizations, the old page hardly at all (it never did have very many listings on such "meta-links" pages).

    If they want to have this be actually useful they will need more up-to-date scans of the Web than once every 8 or 10 months (!).

  • > "Canada insured 100 percent of its citizens for $2,250
    > per person in l998 while the United States expended $4,270 per person
    > insuring only 84 percent of our citizens."


    Which is why we see large number of americans going across the border for canadian healht care.


    oh, wait a minute . . .


    :)
    >not only that, its cruel
    > and disgusting to hold people's health ransom for money...


    Far better to make it illegal to own, say, a private CAT scan machine, and hold health ransom to time, while allowing vetrinarians to have the same machines to use on pets (which stand idle while people die waiting their turn for the human ones).


    hawk

  • by mattdm (1931) on Monday July 23 2001, @11:58AM (#67765) Homepage
    It's not a matter of security. A lot of stuff you don't want indexed because it's temporary or maybe just too variable (dynamically generated). robots.txt is mostly for the search engines' own good, and if they're ignoring that, they'll eventually be full of junk.

  • This [perens.com] page sums up pretty much what I feel about software patents.
  • If you spent $300 million researching that new way to build a house (granted, I don't know how much Google spent researching indexing topics) , and then you started charging to build houses for people lets say, 300% faster and with 200% more strength at the end, would you want some other company coming in and stealing your process that you spent $300 million developing? How would you plan to recoup the research costs? Would you even spend the money reseatching in the first place if you KNEW that it would just get ripped off and you wouldn't be able to get any of that research money back because of competition? Funny, patents actually are good for innovation.

  • And in other news, Netiquette has passed away after a prolonged fight against idiocy on the net. GIF at 11.
    --
  • Imagine someone who created a cure to AIDS and patented it, that person could charge ridiculous amounts of money for this cure even if it was something simple and cheap to make. People who couldn't afford this cure would die simply because John Doe patented his cure.

    Interesting you should use this analogy. I've just recently been asked to participate in a genetic study of diabetes (I've been a diabetic since I was 4). All I have to do for now is provide a blood sample. In the information I was given, they specifically say that you have to give up any rights to products that may be derived from research. Specifically, they want to be able to patent the products that they produce from this research so that pharmaceutical companies can license the rights to manufacture them.

    So, do I have a problem with this. Nope, none whatsoever. Because, if the big pharmaceutical companies can't protect their product then they won't manufacture it. And if they don't, who will? Who else can afford the R&D? It may be that by giving up my rights to this research I will help to provide a cure or prevention for diabetes. I'm happy with that.

  • And when is this likely to happen? Never. So we're stuck with having patents and big companies making and selling the products to save peoples lives, or no products and a lot of dead people. Make your choice.
  • Their database is old and doesn't seem to be all that big. They don't seem to honor boolean terms. They'll throw back dozens or hundreds of related pages from the same part of the same site without grouping them or squelching them. No apparent support for fuzzy spelling variations.

    And when Apache and Debian show up at the top of a query on "Linux", it throws the sophistication of Google's relevance calculations into relief. Apache and Debian are linked from a ton of web pages, but the overwhelming majority of those pages are message board postings and message board TOCs or things like "This site runs on..." page footers. What this says to me is that Teoma isn't doing a good job of weighting the relevance and prominence of inbound links. It's as though it's going purely on the raw number of times the search term appears in a page linking to Site x regardless of how many are clearly identical and thus probably links from menus and TOCs, and not from the pages' unique content, where a link should count far more.
  • by Cederic (9623) on Monday July 23 2001, @05:49AM (#67776) Journal

    Hmm. I use Google because it finds what I want faster, more efficiently and more accurately than any other search engine I've used.

    I love the clean simple fast interface. I love the lack of flashing banner ads. I love the relevance of the text based ads, and the differentiation between those and my search results. I love the categories, and that half the time it'll show me a category listing exactly what I'm after, as well as the normal list of sites. I love the fact that I can have Google in Dutch, despite not speaking that language. I love the site: tag and the difference it makes when looking for UK sites or for something on a specific website. I love the cache and how it insures me against the aging web. I love the sheer breadth of material available. I love the approach and insight of the company, how it focusses on searching, making searching easier, and on being good at searching, and doesn't get distracted by obscure business models. I love the way that occasionally they switch out the normal logo for one that celebrates a given day, and then links that logo to a search result that is relevant.

    Oddly enough, the fact that they're running on x thousand PCs running a free operating system doesn't really impact on me at all. I have immense respect for the engineering involved, and for the responsiveness of the site, but I also wonder if a hardcore IBM mainframe might have been cheaper overall.

    If MS bought Google, I would still use it. If they started showing banner ads, popups, forcing you to hold a Passport account, prevent non-IE browsers viewing the site, then no, I wouldn't use it.

    Right now there is no search engine that comes close to the beauty of Google. I recognise that beauty from a technological perspective, irregardless of the back-end OS being used.

    ~Cederic
  • Some links are scripts. The whole idea behind robots.txt is that some links may never end or won't give fixed results. It's good advice for you to follow as it will keep your spider from spending all day following links and it will keep your search engine from indexing content that will be different the next time it is visited.

    > ANYONE CAN READ IT

    Actually, if I found your spider ignoring my robots.txt, I'd block you and you'd never see my site again.
  • most of us who use Google were fans waay back when their database was a fraction of the size..

    During Google's Beta period, they focused on indexing tech-related sites, specifically Unix/Linux/Perl/etc related stuff.

    I think that's why the fanbase on Slashdot grew so quickly - they were exactly the target market. And the expectation that they would spread the word to their technical and non-technical friends has been successful. I know several non-tech users of Google who must have found them by word-of-mouth (seeing how they don't advertise).
    --
  • I just decided to go take a poke around, and as a test, I decided to perform a search on linux mips. (I've been browsing around recently and doing a bit of hacking on it lately, and I know which sites I found the most relevant for it.)

    The results, currently, are pretty similar. The first link on the page pointed directly to the Linux/MIPS HOWTO, which I've been referring to quite often recently. Everything else is quite similar down the rest of the first 10 results as well.

    Google still has it's advantages over Teoma at the moment though:

    • The nested links for pages on the same site.
      It's one of those things that quite frequently are useful when you're searching for something: instead of landing on the main page of the site (if that contains your search terms, and is of course linked more often), you can go directly to the part of the site that addresses exactly what you're looking for.
    • The Google cache.
      I really hate it when a site that I want to go visit has pulled it's content or moved it around. But if I'm doing a search on Google, or I even know the last known address of a page, I can just head over to the Google cache and often pull up exactly what I'm looking for, even if the content has been moved or deleted on it's original server. Sites, unfortunately, do vanish from time to time. It's always nice to be able to access that content when you need it most.

    Anyway, that would just be my whole 2c on it.

  • Huh. No Google cache. No Google Groups

    You forgot the most important of all...
    No "I'm feeling Lucky" button! :-)

  • by KuRL (13889) on Monday July 23 2001, @04:04AM (#67784) Homepage
    The poster accidentally put a space in his http tag.

    Here's the real link. [searchenginewatch.com]

  • by augustz (18082) on Monday July 23 2001, @04:55AM (#67787) Homepage
    Google has really done right by its users and advertisers.

    I spent a good bit on an adword compaign that picked theKompany and other KDE keywords following theKompany's claim that such competitive advertising was illegal. Needless to say the KDE camp went all out, hit spamming my ads, I went though around 10,000x the number of impressions/hour I was supposed to. Google staff was prompt, courteous, fixed the problem, tracked the spammers back to germany (?) and refunded my money.

    As for credibility, they'd be one company that I'd be willing to give my email address to, knowing that they get it and won't be sending me "Important Updates" every month.

    Competition is great, but let's not forget the good that Google has done. We need a well funded company to fight off things like the Altavista patent lawsuits on searching.

    I don't understand why some folks are so virrulantly anti-google. The flack they took for putting up the deja archives who totally unreasonable seeing as they had barely got the archives out from under deja.com's decaying body. And their new image search is damn cool.

  • by MS (18681) on Monday July 23 2001, @05:48AM (#67789)
    I searched for different words, here's what I got:
    • some results are totally unrelated to the word(s) I inserted
    • results with Umlauts are shown in a wrong character-set, resulting in garbage
    • the number of the results is only 1/5 ~ 1/10 of the results Google or Altavista give for the same searchterm, so I suppose Theoma has indexed only a 10th of the pages other searchengines have
    • Oh, they use Helvetica... it looks really ugly on my Win89Box, with some adiacent characters overlapping
    • and well, I love Google Groups, the Google Cache, the changing Google Logo, the ability to try the search on other engines...
    Theoma has a loooong way to go, but then: also Google took 2 years to beat Altavista, so for Theoma there may lay another 2 years ahead... Since Altavista revamped their search-algorithm, and speeded up their interface, when Google arrived; the same will happen again: Google AND Altavista will make their search better again.

    just my 2 c
    ms

  • maybe they will italicize everyone to death.
  • by harmonica (29841) on Monday July 23 2001, @05:03AM (#67794)
    It's nice, but the problem is that those search engine with bought rankings also "poison" meta search engines. For one request, I got download.cnet.com as #1 site because it was ranked very high on various sites used by vivisimo. It had *nothing* to do with the request :-(

    I would also appreciate it if all high rankings of a site are displayed. It helps you to find out where you must still submit your own site.
  • by harmonica (29841) on Monday July 23 2001, @05:48AM (#67795)
  • by generic-man (33649) on Monday July 23 2001, @04:14AM (#67796) Homepage Journal
    I've already discovered Vivisimo [vivisimo.com], which is a nice step up from the meta-search-engine garbage of yesteryear. (Disclaimer: I go to CMU, which developed much of the technology behind Vivisimo, but I personally didn't work with it.) Not only does it sort links by relevance, it also categorizes results. I found it very useful when doing a research project last year -- searching for "Japanese Women" on even the most finely tuned search engine turns up pages of results that can be diplomatically called "non-academic."

    I doubt it's a replacement for Google, but I recommend it the next time you're searching for a topic that might have several different meanings.
  • by Hard_Code (49548) on Monday July 23 2001, @04:09AM (#67802)
    I think the space is courtesy of Slashdot. It always munges links.
  • Try:
    • linux kernel USB scanner -homepage -jumppage -links -nude -sex -"my home page"

    The near word is implicitely in every search-- pages rank higher when the search terms are found near each other.
    --

  • Slashdot's robots.txt doesn't include /articles where discussions [slashdot.org] can be found.
  • by leucadiadude (68989) on Monday July 23 2001, @04:09AM (#67809) Homepage
    Well I did read this (yes, I actually READ the referenced article before posting):
    "Currently in beta, the site is primarily intended to demonstrate Teoma's technology to potential partners or buyers."

  • ... the boolean search options that AltaVista has. I find it quite helpful to be able to say:

    linux near kernel near USB near scanner and not (homepage or jumppage or links or nude or sex or "my home page")

    and be able to filter out the crap.

    If google would allow a post-processing phase to apply this sort of logic AV would disappear from my list of search engines.
  • by GrEp (89884) <crb002 AT gmail DOT com> on Monday July 23 2001, @05:33AM (#67817) Homepage Journal
    The money they will get out of this has little to do with the single user search engine, and everything to do with the data mining. Companines like Google can mine their databases to do marketing queries on a HUGE scale. The search engine is great for the rest of us and a great advertising tool for them, but it is not where the money is.

    bash-2.04$
  • by debaere (94918) on Monday July 23 2001, @04:16AM (#67820)
    How long do you think, assuming that this new technique is valid, will it take for Google to catch up and provide similar results?

    They already have 5 of the 6 requirements ( as I see em):
    1. Existing, proven, scalable infrastructure
    2. Gob-loads of search engine experience && the programmers/net admins to back it up
    3. A better name (Marketing, sadly, does count)
    4. ~1.3 billion pages already 'spidered' and waiting to be re-munched using any technique they deem appropriate
    5. A lot of high-paying corporate customers (Yahoo!, RedHat etc) which helps pay for everything... and lets face it... money talks.

    ALl they really need is an algorithm.... whish shouldn't be a problem from the guys that revolutionized searching in the first place.

    My $0.02

    DOS is dead, and no one cares...
  • by debaere (94918) on Monday July 23 2001, @04:20AM (#67821)
    I don't trust search engines that don't let me get lucky... um... feel lucky...

    DOS is dead, and no one cares...
  • by Dr_Cheeks (110261) on Monday July 23 2001, @05:02AM (#67827) Homepage Journal
    ...what I really like about Google (besides the sleek interface and the cache), is the fact that the first 30 results on a query, like "gcc usages nuclear physics linux", aren't something like:

    Atomic chicks to blow away your gcc-perverted brain by playing with their Linux PDAs or some such shit.

    I always thought that was a pretty good feature of search engines. Y'know - do some research and get pr0n at the same time. WHISPER Incidentally, do you have any good links for Atomic chicks to blow away my gcc-perverted brain by playing with their Linux PDAs? /WHISPER
  • Huh. No Google cache. No Google Groups. I think Google will remain my favourite for a little while yet (though it's interesting to see that this engine has clearly modelled it's interface after the simple Google one).

    Oh, and it doesn't seem to have indexed as much of the web as Google yet (admittedly, tested using the not-very-scientific method of searching for myself and my site), but I guess that'll come with time.

  • most of us who use Google were fans waay back when their database was a fraction of the size that Teoma's is now, and we still swore by it. It's interesting that some of the same people I have talked to who were militant in their support of Google (is, it's "our" search engine!) now are disdaining Teoma.

    And I am sure that Google will respond to the challenge with honor - I can't imagine that Google would try a patent challenge. It seems so out of character. But then again, I may be guilty of putting Google on a pedestal just because it was started by other geeks. Though one could make the argument that in today's downturn economy, patent litigation is just good business sense. There are no morals or honor in pure capitalism.

    I'll add Teoma to my bookmarks - if they give me better results than Google, I'll switch in a heartbeat. Even if they run M$ IIS !

  • right, obviously, since the article clearly says the site is just a demo to attract interest from investors. Teoma has not yet decided whether or not to run as a standalone search engine.

    PLEASE read the article before posting

  • by jspaleta (136955) on Monday July 23 2001, @04:59AM (#67842) Homepage
    About 4 years ago now, I was at a SIAM conference at WPI, and the keynote speaker outline a procedure to use links to sort webpages...and I still haven't seen a search engine impliment this.

    It basically involves two weighted listing of sites. Sites in the second list pointed to by sites in the first list earn weight points based on the weighted value of sites in the first list. Sites in the first list earn weighted value based on the site that they point to in the second list.

    You iterate this a few times and you end up with the first list being a listing of "Link Pages" which have a lot of useful links on the subject. The second list becomes an ordered listing of "authortive sites", sites that are pointed to by many other sites.

    What's really neat about this is this method has the ability to find seperate communities. For instances, search for the word jaguar and this method will give you authoritive sites and link pages for the car, the animal, and the atari games system quite easily....becuase each meaning of the word jaguar would have a distinct grouping of authortive sties and link pages.

    What's more is this type of problem can be formulated as a eigenvector calculation for the matrix of link pages, and authoritive sites.

    -jef

  • by zsazsa (141679) on Monday July 23 2001, @05:01AM (#67844) Homepage
    I'll add Teoma to my bookmarks - if they give me better results than Google, I'll switch in a heartbeat. Even if they run M$ IIS !

    Good thing they aren't! According to Netcraft [netcraft.com], they are running Apache/1.3.12 on Solaris 8 [netcraft.com].

    Ian
  • On the grounds of sounding redundant, I'll second that. It's fast and the few queries I did make were close enough that I'll keep watching this.

    On a slightly related note, Google's director of operations and head sys-admin gave a great technical presenation of why google runs so damn fast last week at the Bay LISA meeting. Two of the more interesting things were that Google's colors on their search page were chosen for rendering efficiency and the fact that they have a team of people who actually count the bytes on their pages to make sure that you are getting all the necessary info with the least possible bytes. Considering it was a free talk, it was very interesting for us linux enthusiasts.

    Hopefully this newcomer will go to the same lengths to make their search engine competitive...

  • do I have a problem with this. Nope, none whatsoever. Because, if the big pharmaceutical companies can't protect their product then they won't manufacture it. And if they don't, who will? Who else can afford the R&D? It may be that by giving up my rights to this research I will help to provide a cure or prevention for diabetes. I'm happy with that.

    Big Pharm spends *by far* more on advertising than Research. See here. [essential.org] Also, to as a side-note, please see here [washingtonmonthly.com] to understand that free-market capitalism in the health care industry doesnt make sense; to note "Canada insured 100 percent of its citizens for $2,250 per person in l998 while the United States expended $4,270 per person insuring only 84 percent of our citizens.", not only that, its cruel and disgusting to hold people's health ransom for money...

    De-Regulating the health care industry is more about stable profit for big-pharm than anything else.. Canada and Britain's citizens would do well to understand what 'American Style' health care really means. Fewer healthy people, higher cost, profiteering at the expense of your health (literally).

    What does this have to do with R&D & Patents? Patents are weapons used by the Health Care Industry to kill people for money. The 'R & D' they do is to make money. Neither thing has 'beans' to do with Healthy People. The R&D should be done by doctors with alot less attachment to profit motives, which by nature, make for an *UNHEALTHY* "Health Industry"..

    "So how do you motivate people to make others healthy when your only incentive is profit" would be a better question.

  • I find it quite nice that this search engine totally ignored my robots.txt and scanned my entire site anyway. How can a search engine, so friggin complex and monstrous, ignore the basics of spider etiquette ?

    I guess it's time to rename my directories again.

  • by yoink! (196362) on Monday July 23 2001, @05:45AM (#67870) Homepage Journal
    Google had no ads at the outset either.

    Additonally, who's to say that those google-ites haven't improved their technology over the last year or so. I'm sure many of us have turned exlusively to Google's tried and true system... oh so easy... oh so accurate.

    Finally I think we love Google's look and those tiny little modifications they make to their logo on the special (but mostly American) dates.

    Hey, if someone can better it, we could all use a search with a button "The right link."


    yoink
  • Thanks to Google, my logs are an endless source of chuckles. I've never put any of these things in the site, but Google users constantly hit it hoping to find creepy, creepy stuff.

    Some of the better search criterion that lead to my rather benign site: [ridiculopathy.com]

    • "Swollen+Lamprey+Nipples"
    • "Proadnivity"
    • "President+Kegstand+Urinate"
    • "Donkeyporn"
  • by tmark (230091) on Monday July 23 2001, @05:11AM (#67882)
    I just wonder how much of the google-lovefest here is due to the fact that they happen to be prominent Linux users. Say they happened to run NT instead...would Slashdotters rave about them if they got the same results ?

    For instance, one of the dominant /. themes is the incessant railing about the evils of IP and patents. Yet google has what probably amounts to a boatload of patents, and they don't seem to get called on it (nor does Transmeta, or Tivo, for that matter). All the patents references I saw in the earlier comments were along the lines of "hmm, google has these patents, wonder if we're set for a big patent fight".

    I bet if MS owned and operated google, /.ers would hate it and would never stop editorializing about the consequent coming of the Apocalypse.

  • by studarus (251872) on Monday July 23 2001, @04:04AM (#67887)
    This is interesting - they have no ads on their pages. How do they expect to make money (and stay in business)? Not that I am complaining - I like the clean interface of google and teoma.
  • Didn't Google get all sorts of patents on the concepts used in their search engine? You have to wonder if a patent fight is on the horizon. I for one encourage competition and it'll take some serious innovation to displace my beloved Google, but I say more power to them as long as they aren't just ripping off Google's concepts.
  • They actualy use the same Ranking mechanism. The only diference is that they are "smarter", and do so in a better order.
    I'm sure everybody remember the Amazon.com 1-Click patente (links/updates, please). I would not be surprised if they managed to patente the Ranking mechanism.

    ---
  • by Cspine (263118) on Monday July 23 2001, @04:56AM (#67891) Homepage
    Most search companies don't make their revenue from their internet sites. Look at google.com. They make a ton of money selling their search software to companies. [google.com] At 1,200 a month per company that wants to use it. Sounds like a pretty good business model to me.

    This teoma, I'm sure, is just trying to attract clients. Just wait, they'll get ads soon.
  • #quoted from the page ... please read the page when you have the time ...

    , first. Google examines link structures all over the web. By doing so, it can give every page a popularity rating known as "PageRank" (named after Google cofounder Larry Page). When you do a search, URLs with high PageRanks are more likely to be listed first. However, this will only happen if the pages also match other criteria, such as containing your search terms or being identified as being relevant to your search terms by analyzing the context of links.

    Teoma operates in an opposite fashion. When you do a search, Teoma looks across the entire web to find pages that contain your search terms or which are considered relevant to those terms based on link context. After finding a matching set of documents, which it calls a "community, Teoma then examines the links between just this set, to determine which are the most popular.

    #end quote.

    I don't see how this can infringe on any patents, unless google patented the method of ranking pages by external linkages (can they patent that?).

  • by OpenSourced (323149) on Monday July 23 2001, @04:15AM (#67898) Journal
    I have made some queries, about topics in which I'm usually involved, and some of the results have impressed me. Other were rather off the mark, to say the truth, but hey! it's a beta.

    It seems to be not concentrated in pages but in sites, so being rather a different approach to google.

    In any case a link to keep and a technology to watch. There are never too many good search engines. Good luck to them!.

    --

  • Personally I don't care if they ripped off every one of Googles concepts, if their engine works better I'll use it. Patents, in this area at least, in my opinion are really stupid, just because I figure out a new way to build a house thats faster and more effective than older methods doesn't mean everyone else should have to use the older methods just because I don't want competition. Google right now is the best search engine out there because of it's advanced concepts, if all the other engines go down, why would Google ever bother to improve their system. With new competition Google and the other engine will be forced to continue innovating. Competition is always good.