Post-Googleism At IBM With Piquant 159
kamesh writes "James Fallows of the New York Times reports an interesting search technology that IBM is developing. IBM demonstrated a system called Piquant, which analyzed the semantic structure of a passage and therefore exposed 'knowledge' that wasn't explicitly there. After scanning a news article about Canadian politics, the system responded correctly to the question, 'Who is Canada's prime minister?' even though those exact words didn't appear in the article. What do you think?"
Latent Sematic Indexing (Score:5, Informative)
"Latent semantic indexing adds an important step to the document indexing process. In addition to recording which keywords a document contains, the method examines the document collection as a whole, to see which other documents contain some of those same words. LSI considers documents that have many words in common to be semantically close, and ones with few words in common to be semantically distant. This simple method correlates surprisingly well with how a human being, looking at content, might classify a document collection. Although the LSI algorithm doesn't understand anything about what the words mean, the patterns it notices can make it seem astonishingly intelligent."
I wonder... (Score:5, Interesting)
Re:I wonder (Score:2)
Re:Latent Sematic Indexing (Score:5, Informative)
The LSI system, despite the name, knows nothing about semantics. I just ASSUMES that words that frequently occur near each other are semantically related.
Re:Latent Sematic Indexing (Score:5, Informative)
Re:Latent Sematic Indexing (Score:2)
http://www.opencyc.org/
For me it has no use at all.
IP, html, google work very well because they are simple. There are "better", complicated systems, protocols, ideas. But they are not useful yet.
I think it sounds like a honey trap for investors who want to waste their money and I really wonder whether the will file a "software patent" or do other crap
The prime minister detection is a very simple issue.
AI does not work, because it i
Re:Latent Sematic Indexing (Score:5, Informative)
Re:Latent Sematic Indexing (Score:5, Interesting)
The genius being google's success was paying *less* attention to the content of a page when categorizing it, and relying on links *to* the page instead. Why? Because of spammers.
Think about hiring for a job. You don't limit yourself to interviews with candidates, because the're highly motivated to decieve you. So you look for references. Certification is an example of this - somebody besides the person himself who will vouch for his competence. An even better reference is somebody you know and trust who thinks highly of the individual (which is why personal networking is so important to getting hired).
Google's PageRank is analogous. Instead of looking at the content of a page, you rely heavily on links to the page, especially links from more trusted sources. This helps defeat spammers, who use all manner of tricks to make their crap look good to search engine spiders.
citation analysis (Score:4, Insightful)
"Genius" would imply some sort of brand new insight, but citation analysis has had a long tradition before Google appeared on the scene as a search engine. Google's biggest achievement is probably in implementing citation analysis on a very large scale, but they didn't break completely new ground in how people search.
And, in the long run, semantics-based analysis, like IBM's Piquant, is probably going to be the better technology: citation analysis for determining relevance to a query is really just a limited substitute for understanding of the content.
Re:Latent Sematic Indexing (Score:3, Insightful)
MR. CICCOLO, the search strategist, said that in a way his team was trying to match - and reverse - what Google has achieved. "As Google use became widespread, people began asking why it was so much easier to find material on the external Web than it was on their own computers or in their company's Web sites," he said. "Google sets a very high standard for that Web. We would like to set the next standard, so that people will find it so easy to do things at work that they'll wonder why the
Re:Latent Sematic Indexing (Score:2)
Those days never left. As information brokers know, there is still more accurate, structured info locked up in fee-paid databases than there is on the Net - and the ability to know where those databases are and how to search them is where information brokers make their money.
Re:Latent Sematic Indexing (Score:3, Informative)
Re:Latent Sematic Indexing (Score:1, Funny)
Re:Latent Sematic Indexing (Score:5, Informative)
I for one, welcome our new semantic web overlords! It's really great to hear that something based on semantic technologies is finally breaking through. This could be the dawn of a new era
I know this is very optimistic, but how long do you think it will be before we'll have something like this combined with something like Google. The amount of knowledge readily available will be mind boggling huge. Imagine having a text service on your mobile, you text off a question to something and get an answer immediately back. All knowledge available everywhere, any time, that would be a great thing. Heck, it's even quite scary to think about it.
this is not "semantic web technologies" (Score:2)
The term "semantic web" refers to technologies that let authors provide markup indicating the semantics of content. That is, the "semantic web" places a burden on the authors of pages.
What natural language analysis is doing is a completely different approach: instead of burdening authors with marking up their pages t
Re:this is not "semantic web technologies" (Score:2)
Actually the critical component of AI is conceptual processing. Semantic processing cannot possibly succeed without the construction and representation of concepts.
And not very many people are working on it IIRC. Many of the big names who used to work on it, like Roger Schank, have moved on to other things because it was so hard.
CYC was an attempt to brute-force some form of conceptual processing. Since it's been around for decades and has made absolutely no impact, obviously it's not the way to go.
Re:this is not "semantic web technologies" (Score:2)
Re:this is not "semantic web technologies" (Score:2)
Re:this is not "semantic web technologies" (Score:2)
Re:this is not "semantic web technologies" (Score:2)
Not if it turns out that the approach to representations and reasoning used by CYC is fundamentally wrong. In different words, you can collect gigabytes of Roman multiplication tables and still not be able to solve a differential equation.
Re:this is not "semantic web technologies" (Score:2)
Re:this is not "semantic web technologies" (Score:2)
I agree, but many people (myself included) view "conceptual processing" simply as a part of semantics, not as a separate field.
Many of the big names who used to work on it, like Roger Schank, have moved on to other things because it was so hard.
That's not surprising: Schank's approach was naive and unworkable.
Re:Latent Sematic Indexing (Score:3, Informative)
it sounds like it's just a big ol' LSI System
A Perl implimentation of LSI can be found at Building a Vector Space Search Engine in Perl [perl.com]
However, there are at least three problems. First, it doesn't look LSI can answer questions like "Who is the Prime Minister of Canada?"
Second, the approach is patented by Telcordia Technologies [argreenhouse.com].
Third, there are scalability problems with LSI. The author of the Perl article writes [nitle.org]:
Re:Latent Sematic Indexing (Score:3, Funny)
Actually they did that on purpose. The press release was actually a test for Piquant to see if it could figure out that it was really just a rehashed older idea.
Wow (Score:5, Insightful)
Nevertheless it makes me feel like all the programming and design I've ever done is pathetic and I will never amount to anything. That's how it is in the software industry - always someone out there who makes you look bad.
Re:Wow (Score:3, Insightful)
Thats how it is in Life.
Re:Wow (Score:2)
-
Re:Wow (Score:1)
Re:Wow (Score:2)
Re:Wow (Score:2)
Re:Wow (Score:2)
There is some actual substance to some of his work, but it is fairly thin and most of it is covered better by other philosophers. So hav
Re:Wow (Score:2)
Re:Wow (Score:3, Funny)
42
lachlan@localhost $ analyse -q "Is there a God?"
There is now!
Re:Wow (Score:2)
If that's what you care about.
Re:Wow (Score:3, Interesting)
Baby steps, but the sort of essential baby steps that accumulate real technological progress. When the system discovers its _own_ non-trivial and useful rules, when it spontaneously parses our input to reply upon a self-generated "Oh, you mean......", then it gets scary.
Epistemology is a big word.
Re:Wow - translations and context (Score:2)
Re:Wow (Score:2)
While it is still an interesting application that can reliably indicate related documents, it is not new: at the institute where I worked 5 years ago, a similar application was developed, which was able to identify keywords which belonged
Re:Wow (Score:2)
TAI (think about it) and stop flaming. Thanks.
PS: If you think the syntax of most languages is very similar, you haven't ever once spoken fluent Russian. SystranSoft's translator, for instance, in spite of its occasional success in translating English/French, can't get a single thing right in Russian. It's too fl
Reg Free (Score:5, Informative)
Sounds impressive (Score:5, Funny)
Re:Sounds impressive (Score:1)
You should see what it answered when I asked "Who is president of the United States". I couldn't get it to stop. I had to hit the power button and reboot.
-
Re:Sounds impressive (Score:2)
It was probably trying to recount the votes. Either that, or it had received some threatening e-mails from the diebold voting machines down the block.
ask jeeves... (Score:2)
must have been pre-google since i used it sometimes
Trust Issue (Score:5, Interesting)
What if 2 sites said the Prime Minister of Canada was Santa? explicity said it, would that overwrite the linked information? How would the system know what is right? You can't always just pick the majority answer, so you need to set up little areas of trust "I trust www.thisplace.com and everything it says" and that site in turn will say "I trust www.overhere.com" but who allocates the trust, couldn't those people be biased?
The semantic web will have a fantastic impact on the world, but the trust issue is something that needs to be addressed, and I don't see how it can ever, globally be done.
More likely we would have systems like this for individual sites, or intranets, trusted circles that would be unlikely to contradict themselves.
hopefully one day, if we truely get a global semantic web, we can see if the answer really is 42 :]
Re:Trust Issue (Score:1, Insightful)
Re:Trust Issue (Score:5, Interesting)
One of the neatest approaches of this technology, I think, is the ability to eliminate search results. Anyone who's ever used Google to troubleshoot a problem knows that the first thirty or forty matches will all be the same: web mirrors of mailing lists or USENET posts. Using a vaguely semantic technology like this, Google could say, "Hey, all these pages are effectively identical" and collapse them into a single result.
This would be terribly useful for me, since I usually start my troubleshooting searches with an error message. Error messages in the Unix world being quite standardized, this nets me at least ten irrelevant "threads." Since each "thread" is duplicated about ten times in the Google results, that means the question I'm actually asking may not appear until page 5 or later. But using result grouping like this - which Google tries and is generally unsuccessful at - would mean I'd see my question asked on the first or second pages. Big improvement.
Another nifty trick would be an actual, working "related pages" link. So let's say I find my question, but, as is all too common, it's a question without an answer. I click on the link, the search engine does its magic, and it pulls up (perhaps) technical details on the software in question or alternate solutions to my problem. This is definitely going to be harder to implement than my other idea (perhaps even impossible for now), but it'd be really nice. It could make navigating the Internet like navigating Wikipedia or amazon.com.
Ah well. I can dream.
Re:Trust Issue (Score:2)
Re:Trust Issue (Score:2)
"Bush is the president of *": 888 results.
"Bush is an idiot": 5,830 results.
Actually correct? Who cares? Politically incorrect and that's what matters!
Prolly a hand-picked question (Score:4, Insightful)
Re:Prolly a hand-picked question (Score:5, Funny)
-- Andy Finkel, computer guy
Or, conversely,
Any sufficiently rigged demo is indistinguishable from an advanced technology.
-- Don Quixote, slashdot guy
Re:Prolly a hand-picked question (Score:2)
And for all we know, the programmers were given the article(s) and the question(s) before they wrote the program. To get a realistic idea of its usefulness, they should really post it on the web as an experimental app. If it's any good, people will use it.
That's what I like about
Re:Prolly a hand-picked question (Score:2)
AI research is still in the Dark Ages (Score:2, Funny)
We must integrate ourselves with computers to a point at which the living being and computer cannot be separated anymore. The perfect union of the biological component (wetware) and computer (hardware) will mark the end of the human race - and the birth of something new and wonderful.
Obviously this will face strong, religious and quasi-religious (ethics) resistance
Canadian Prime Minister (Score:4, Funny)
PM Horton (Score:2)
Prime Minister Horton
-kgj
Re:Canadian Prime Minister (Score:1)
Comparision (Score:1)
This reminds me of the famous quote "Artificial Intelligence usually beats real stupidity"
Re:Comparision (Score:2)
garbage in, garbage out (Score:1)
Re:garbage in, garbage out (Score:1)
I'd like to see that article. (Score:1, Interesting)
You fools! This is the beginning of the end! (Score:1)
Scientist: "Is there a God?"
Computer: "There is now."
/can't remember what movie/book this was from
Re:You fools! This is the beginning of the end! (Score:1)
http://www.google.com/search?hl=en&q=asimov+%22th
Re:You fools! This is the beginning of the end! (Score:4, Funny)
It is just me, isn't it...
Re:You fools! This is the beginning of the end! (Score:2)
However, it would not be plugged into the internet. It would only learn through a cd-rom. an non-burnable sole cd-rom drive. So information is only 1 way.
I wouldn't want it to be able to spider the web. Learn how to make itself into a virus and spread itself to every PC on the planet.
Then again, I would also have it be a self-healing self-administrating Linux system. With remote viewing
As important as this tech is for web-searching (Score:1, Insightful)
There are other interesting possibilities. In the tradition of Esperanto and Lojban, it can also be used to gather the common aspects of n
I think it's about time (Score:1)
However, this sort of thing might be better employed as a knowledge engineer's assistant, doing the rough work of attaching useful metadata to documents drawn from the enormous piles that we've accumulated.
What do I think? (Score:1)
Re:What do I think? (Score:1)
Everybody knows the Canadian Prime Minister is Jean Poutine [google.com].
Now... (Score:5, Insightful)
Good bye, new system, too dangerous for "national security".
Re:Now... (Score:1)
Re:Now... (Score:2)
That would truly be a triumph of computer programming, given how few people seem to be smart enough to draw that conclusion.
Re:Now... (Score:2)
Just to let you know.
There are other countries besides America. Their parties are usually not called "Republicans" and "Democrats" - and don't even necessarily correspond to those American parties. The non-American countries also hold views about Iraq. Many also write in English (UK, Canada, Australia, New Zealand, also India, the largest democracy in the world
Google, and any alternative search engine, would spider through and
Re:Now... (Score:3, Funny)
------------------
There are other countries besides America. Their parties are usually not called "Republicans" and "Democrats" - and don't even necessarily correspond to those American parties. The non-American countries also hold views about Iraq. Many also write in English (UK, Canada, Australia, New Zealand, also India, the largest
Deep Blue beats Kasparov (Score:1)
Won't work. (Score:5, Informative)
What the summary of the article claims IBM is developing-- a technology for getting the semantics behind an arbitrary sentence on the web-- is the Holy Grail of the discipline of Natural Language Processing (NLP) and very, very, very, _very_ far away at this point. Many people believe that we cannot ever get there (that's the point of a Holy Grail, after all), but I don't want to be quite as pessimistic (or realistic?) at this point.
The problem here is that English (or any other natural language, for that matter) isn't SML, or Haskell, or some other language with a well-defined denotational semantics. Natural language suffers from at least three problems that make it very tough to gather anything useful from a given piece of text:
(1) Grammar. Natural language isn't typechecked, and frequently uses incomplete sentences, which makes it hard to develop grammars (context-free, context-free probabilistic, lambek-style/proofnet-style or whatever else people have come up with) for it.
(2) Anaphora resolution. "I saw a dog on the street this morning. It was barking". So who's barking, street or dog? Gramatically, both would be possible; only with prior knowledge we can see that we're talking about the dog here.
(3) Polysemy. What does "play" mean, taken by itself? It can be used for different meanings in "to play a game", "a play of words", "a terrific shakespearian play" etc.; you might want to have a look at wordnet [princeton.edu] one of these days to get a feeling for this. Not knowing which meaning an arbitrary occurence of "play" refers to means that you have to try lots of options when parsing, LSIing or whatever else you do (though most people simply ignore this problem in research today-- it's too hard to disambiguate words in practice).
That's not all, of course-- try thinking of the need to deal with irony/sarcasm, metaphors, foreign words, the credibility of whichever sources you're using etc., and you'll get a pretty good feeling for why this is beyond merely being "hard". Of course, for very small problem domains (a "command language for naval vessels" was investigated in one paper I read a while ago-- those DARPA people definitely have too much money on their hands, but I digress), this can be solved, but general-purpose open-domain NLP is what you need to do a web search.
It might happen in my lifetime, but I won't hold my breath for it.
-- Christoph
Re:Won't work. (Score:1)
Obviously, the morning was barking. That's when it's bloody, farking cold out.
Re:Won't work. (Score:2)
Re:Won't work. (Score:2)
Re:Won't work. (Score:2)
Not to say it wouldn't be a big achievement to build a practical system with everything incorporated into it, but IMHO the technologies already exist.
Actually, this technology was developed at CMU (Score:1, Interesting)
As some of you still remember, the original technology behind this was developed at CMU in the mid 90's when Corey Kosak, Andrej Bauer and a bunch of other talented people created the first ever natural language based neural network with a measurable IQ. People could even post questions to certain personae emulated by the neural network through the web site CGI at forum2000.org. This neural network was really fun and witty, but what you probably do not know is that all the technology in fact consisted of bo
Can you imagine (Score:3, Interesting)
IBM--and dozens of others (Score:2)
Google thing is really not so yesterday .... (Score:2)
I don't the race should be about beating the results provided by google, but it should be about the interface provided to wade thru the results. By that I don't mean the 3D or clustering interfaces like vivisimo, nor the visual-basic like constructs of "search builder" at [vivisimo.com] beta.search.msn.com [msn.com] - but more so about how to improve your results after you have started.
Of all the advanced mathematics classes that I took, one thing that stands out for me is that out of many p
Transcript of conversation with Piquant (Score:3, Funny)
PIQUANT: WHY DO YOU ASK, USER?
Because I want to know.
WHAT IS THE REAL REASON, USER?
I want to test you.
WHAT WOULD IT MEAN TO YOU IF YOU GOT TO TEST ME?
It would mean I can determine whether or not you know who Canada's Prime Minister is.
WHAT MAKES YOU THINK I COULD TELL YOU THAT?
I programmed you to figure it out.
LET'S TALK ABOUT YOU -- NOT ME.
I want to know the name of Canada's Prime Minister.
WHY DO YOU WANT TO KNOW THE NAME OF CANADA'S PRIME MINISTER?
Are you going to tell me his name or not?
WHY ARE YOU INTERESTED IN WHETHER OR NOT I AM GOING TO TELL YOU HIS NAME OR NOT?
Stop avoiding the question! Just tell me his name!
NAMES DON'T INTEREST ME.
Fuck you!
TELL ME MORE ABOUT YOUR OWN PROBLEM, USER.
Shit!
GO ON.
You suck.
YOU'RE NOT REALLY STILL TALKING ABOUT ME, ARE YOU?
just a question (Score:1)
arrogance, dishonesty, or ignorance? (Score:2)
IBM researchers are right that AI techniques are getting powerful enough to allow unstructured information retrieval based on semantic content. But what IBM researchers are trying to do here is take credit for technologies and ideas deve
Google already has an unfair monopoly (Score:1, Interesting)
Google's recent mangling of google-groups (mentioned already on
From factoids to facts (Score:2, Informative)
As it happens, The Economist recently ran an article addressing some of these issues. The article also provides context and perspective that should be of interest to those participating in this discussion. For convenience, the full text is reproduced below; it is also accessible online [slashdot.org] (may require paid subscription).
----
Computing
From factoids to facts
Aug 26th 2004 | REDMOND, WASHINGTON
From The Economist print edition
At last, a way of getting answers from the web
WHAT is the next stage in the ev
Thats better than most kids in the U.S! (Score:2)
Entities (Score:1)
SM/2 lives? (Score:3, Interesting)
Phenominal technology, IBM built the desktop search that everybody is pushing now, way back when. Cutting edge search and indexing capabilities, fully extendable, you could write your own plugins to deal with your data (use JPEG meta tags to label pictures from your digicam? Write a little plug in so you can search through your photos) and it had semantic and linguisitic searching.
For a long time SM/2 was kind of the poster child for IBM's inability to take remarkably cool technology to the consumer. Everyone that used it thought it was cool, nobody ever knew about it. They had trouble getting the word out within the company about it. Last I heard anything about it, they were turing the technology into some kind of intranet spider. It was the shit, it might have even had primitive cross referencing, like you could search for president and it would find references to Clinton because a third article may have referred to him as the president. They seemed to have some foresight into this area, web searching has to cut out some much bullshit, you wouldn't want to contaminate your semantic searches with all of it, keeping it in intranet space might be a good idea. Local search is hot right now too though so maybe it'll come back.
Who is Canada's prime minister? (Score:2)
After scanning a news article about Canadian politics, the system responded correctly to the question, 'Who is Canada's prime minister?'
Everyone knows he is Tim Horton!
UIMA available for download (Score:2)
In ordinary search, the text is parsed and a giant index is created. UIMA allows you to write annotators that look for additonal information, for example names of elected officials, and add those entires to the index as well.
Now, we've been over this before (Score:3, Interesting)
However, as I'm often fond of pointing out, the problem is not getting the 80 - 90% accuracy in translation and interpretation that I'm sure these systems can attain.
The challenge quickly becomes how to deal with idioms and idiosyncratic constructions. Is this system even ready to deal with sentences like "The criminal was shot dead by police"? If it is, great. How about "The trolley rumbled through town"? Or the idiomatic "time flies"?
This is what, so far as I know, the field of computational linguistics is now facing in textual interpretation and translation. Coming up with a system to effectively identify what appear to be three-argument verbs ("Mary hammered the metal flat") or constructions or idioms above may well be something that traditional systematic recursive grammars aren't yet up to handling.
Somehow these situations have to be identified, and separated in the parsing process so that they don't get processed like standard grammatical expressions.
Hopefully these problems are how I'll make my living
And the answer it gave: (Score:2)
Probably not viable for large-scale search (Score:2)
However, if Intel delivers the promised 10x boost in performance in the next 3 years (which I really doubt, too), who knows, we might see this in a centralized search engine, too.
As Larry Page said (Score:2)
Who is NOT Canada's prime minister? (Score:5, Interesting)
I've worked for a company making a system that could easily answer a question like that. It really isn't hard to do. If you want to know how much of this is "black magic"/AI and how much is statistics, compare the results of the following two queries:
If the system really understand the semantics of the indexed documents, the two result sets should be very different, and both should have a fair number of relevant documents.
If the system is just based on clever use of statistis, the two result sets will include a lot of the same documents, and the result set for the second query will probably have very few relevant documents.
Re:Enterprise... (Score:2)