Slashdot Log In
Semantic Search Points To Better Relevancy
Posted by
kdawson
on Wed May 30, 2007 04:44 AM
from the retrieve-what-I-mean dept.
from the retrieve-what-I-mean dept.
ReadWriteWeb writes in to tell us about an article by Dr. Riza C. Berkan, founder and CEO of hakia.com, describing the promise of and potential for semantic search. This approach to providing more on-target search results contrasts with the dream of the semantic Web. Semantic search doesn't require all the Web page authors in the world to begin adding metadata; but it's not a sure thing that the researchers now developing the idea will get it right.
Related Stories
[+]
Developers: Why the Semantic Web Will Fail 179 comments
Jack Action writes "A researcher at Canada's National Research Council has a provocative post on his personal blog predicting that the Semantic Web will fail. The researcher notes the rising problems with Web 2.0 — MySpace blocking outside widgets, Yahoo ending Flickr identities, rumors Google will turn off its search API — and predicts these will also cripple Web 3.0." From the post: "The Semantic Web will never work because it depends on businesses working together, on them cooperating. There is no way they: (1) would agree on web standards (hah!) (2) would adopt a common vocabulary (you don't say) (3) would reliably expose their APIs so anyone could use them (as if)."
[+]
Tim Berners-Lee Discusses the Future of the Web 112 comments
maximus1 writes "In an interview with IT World, Tim Berners-Lee explains his vision of the Semantic Web. He says: 'The Semantic Web is going to take off particularly when we see people using it for data processing, when we see people using it in more and more things, adding personal data, adding files to government data.' His position on net neutrality: 'We've seen cable companies trying to prevent using the Internet for Internet phones. I am concerned about this, and am working, with many other committed people, to keep it from happening. I think it's very important to keep an open Internet for whoever you are. This is called Net neutrality. It's very important to preserve Net neutrality for the future.' And a fun tidbit — He mentions his 1989 memo to his boss at CERN that described his vision for the Web."
This discussion has been archived.
No new comments can be posted.
Semantic Search Points To Better Relevancy
|
Log In/Create an Account
| Top
| 90 comments
| Search Discussion
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
So what does he offer? (Score:5, Interesting)
(http://www.pisosen.com/content/Madrid.html)
"There are so many ways of doing it improperly, and only one way of doing it right."
But he doesn't say what the right way is, or how it could be, or even if he thinks his company is on the right track. There is no information at all.
Would someone please cut and paste here... (Score:2)
Re:Would someone please cut and paste here... (Score:4, Interesting)
The semantic web is still a Good Thing (Score:4, Interesting)
(http://max.romantschuk.fi/)
Just my 2 cents, anyway.
Re:The semantic web is still a Good Thing (Score:5, Insightful)
(http://www.hwacha.net/)
Honestly, if some Marxist state from the 60s produced propaganda like that, everyone would laugh:
"The People's Revolution is about more than nationalism! New communal agricultural techniques will enable a standard of living of a completely different nature than today! Manufacturing and distributing goods for the Workers could be taken to a whole new level!"
It's the same fallacy: "If only everyone spontaneously got together and did what I think they should, all problems would go away!"
Yet just because the fictional utopia in question is the 'Semantic Web' rather than the 'Workers Paradise', everybody takes it really seriously. And nobody mocks it at all. Nope, nobody ever laughs at the Semantic Web.
Ok, ok, I'm just being mean, I should go and do something useful.
Man promotes own company (Score:3, Insightful)
(http://www.milksucks.com/ | Last Journal: Monday September 15 2003, @12:30PM)
Semantics don't work on a global scale (Score:4, Insightful)
That's good (Score:5, Interesting)
Overall, this is the direction we should be taking. The semantic web is indeed just that: a shiny dream.
Today, we're talking about anyone having the ability to create a web page, using pre-made online page/blog tools, or easy to use WYSIWYG desktop apps.
You can't ask of people who can't make the difference between typing a query in the search engine and typing an URL in the address bar, to add proper meta information on his blog. Not to mention the abuse potential.
I can already hear someone saying "If you don't know the XHTML/CSS specs by heart you shouldn't be making pages" but that's just arrogant. Technology should destroy barriers, not create them, the technology which implements this idea better, will succeed. Look at Google: it will parse even the most horrendous code and extract proper information for it. This is why they are number 1.
BTW, Google already extracts semantic information from both the site and query, but this quite primitive compared to the potential mentioned in the article. Google looks for term context, meaning context, synonyms, related words etc. I hope Hakia.com and businesses like them take this idea further, so there's finally some innovation happening in search (something that only enjoyed gradual and miniscule improvements for the last 9 years, since Google introduced pagerank).
in the defense of meta-data (Score:5, Interesting)
(http://sourceforge.net/projects/karekol/)
Looking for Mr/Ms Right (Score:1)
(http://home.primus.ca/~ronsharp/tororg.html)
Tiresome and wrong (Score:5, Insightful)
(http://www.dspanel.com/)
1: my "role". What am I? Am I a journalist at a newspaper? Am I a private citizen with a large collection of illgotten mp3s?
2: my "context". Am I discussing something? Is this a query related to a conversation I am having with someone else? God only knows how many Google queries actually stem from ongoing IM-conversations where a, to the reader, previously unknown term/subject is brought forward.
3: my "personality". What am I primarily interested in? What is my preferred format of consumtion? If I am 7 years old - what the hell does "Palladium" really mean?
To me it is obvious that the idea of a semantic web, the promise if you will, can never be delivered upon without a framework that is usercentric rather than centralistic in the current Googlefashion. Desktop search is interesting to some extent as a way of tying our personal space with the dataspace outside of our local control but that is still a very limited tool. Since much of what is very simplistically covered in 1 and 2 above is related to interpersonal communication it becomes obvious that what is necessary is data structures that learn from ongoing conversation, eg the intersection of Person A and Person B is described in a way that can give us guidance as to what the appropriate (or most likely) interpretation of the term used is.
There is much that can be said about this but suffice to say that the semantic web people are ignoring the real needs that have to be met in order to create something that is truly semantic and carries a knowledge of what the end user actually intends. Because if we don't understand the intent, we don't really understand anything.
Semantic Search Points To Better Relevancy (Score:5, Funny)
Availablilty? (Score:2, Funny)
BWAAAHAHAHAAAAHAAA
Missing the target (Score:2)
Look, I'm not interested the academic niceties of semantic searching and metadata. What I want, would would actually be useful, would be a way to separate out the low value sites from those of relevance FOR THE SEARCH I'M DOING.
If I'm looking for a review of a product, I don't want 50 shops trying to sell me one. If I'm looking for a explanation/definition of a term, I don't want a page that may mention the word, even if it does have high page rank. If I'm looking for a site that gives me lots of links to connected sites, I don't want one that thinks its an island on its own.
Stop trying to classify the small scale, focus on getting the broad scale right and on classifying the search first. Its an easier and more important question.
doesn't sound like TBL's semantic web to me.. (Score:2)
(http://www.bazaah.org/)
relevance (Score:1)
semantic search, tagging, and social search (Score:1)
Semantic Web (Score:2)
In which respect it does not contrast with the Semantic Web, which doesn't require that any more than the regular Web required every computer attached to the internet to start running a web server. Since this article wasn't about the semantic web to start with, was an inaccurate gratuitous attack on the Semantic Web necessary? (Yes, yes, it mirrors the gratuitous attack on the idea made by the author of TFA.)
Also, semantic search has a harder problem than getting people to start using metadata (which only requires demonstrating utility so that it becomes attractive to adopt), it requires developing a system to understand natural language, including understanding which of many diverse senses of a word is intended in context on a page.
Yeah, so Semantic Web requires getting some web authors to put structured information in their pages, and for that to spread as utility is demonstrated. Semantic search requires, per the author of TFA, "a system which understands both the user's query and the Web text using cognitive algorithms similar to that of the human brain, then brings results that are dead on target (right context) at first glance (not requiring to open the Web page for further investigation.)" (emphasis added)
Compared to that, the Semantic Web is easy.
Semantics derivable from web corpus statistics (Score:2, Informative)
or "machine comprehension" of text on the web can be gleaned by
doing statistical analysis of the relationships between words and phrases
across the entire web. Aggregating across a large corpus eliminates "noise"
in usage and draws out the semantic "signal" about how people relate the
concepts to each other.
Dogma (Score:2)
(Last Journal: Wednesday January 05 2005, @01:10PM)
That is a fallacy - you can not know that there is 'only one way of doing it right', if you don't know what that 'right' way is to begin with - you are dealing with unknowns. In truth there are very few systems that collapse to a solution set of one. The phrase 'there is more than one way to skin a cat' comes to mind. This one statement tells me the person making the statement is more interested in controlling the method, rather than pursuing investigation for the sake of advancing our fundamental understanding. Every problem is not a nail, and every tool is not a hammer.
Given the variations evident (particularly when trying to propogate the 'one true' ontology), I see the semantic web as a utopia that is unapproachable. The reality will be some hybrid of the best ideas to come out of this research, coupled with (or layered above/below) the practicalities inherent with multiple ontologies/tagging systems, human interpretations and how to resolve/share those differences for each person. That is where the real solution set lays.
Question-answering systems (Score:2)
(http://www.animats.com)
"Semantic search" is actually a dumbed-down version of what, in AI, used to be called "natural language question answering systems". The first one that was sort of useful was Bobrow's "Baseball" [atariarchives.org], which, unlike Eliza, actually did something useful. "Baseball" had a small database of baseball statistics, and could answer questions like "How many games did the Orioles play in June?". I'm surprised that someone doesn't have a natural language query system for sports statistics on the web today. It's not out of reach technically, because the underlying data is well-structured. Sports fans would use it.
What something like this is really doing is translating natural language to SQL. "How many games did the Orioles play in June?" translates to something like SELECT COUNT(*) FROM games.baseball WHERE (hometeam="Orioles" OR awayteam="Orioles") AND month(gamedate) = 6 AND baseballseason(gamedate) = baseballseason(NOW()); There are existing tools for this [hallogram.com], and there have been for years.
"Semantic search" is a dumbed down version of that because it doesn't try to answer the question. It just tries to spew back material which appears to contain an answer to the question. It's like talking to a politician, sales rep, or Jesus freak. "Ask Jeeves" was about as close as we ever got in the WWW era.
The problem with semantic search is that standalone queries have to be stated with more clarity and precision than most users are likely to achieve. The original article suggested "What is palladium used for?" as a query. That's a completely different query from "What is the Palladium used for?". As a standalone query, the best answer is probably "Worship of the goddess Pallas Athene". Which is probably not what the user wanted. With location hints, one might guess that the user wanted information about some theater or nightclub named the Palladium. But that's a guess; sometimes it will be wrong.
This leads to systems that engage in dialogue with the user. Probably by asking the user multiple choice questions. That's quite feasible, but it usually just means funneling the user into some kind of "wizard"-like sequence of dialog boxes. Many sites have "product selectors" like that.
Another approach, which seems to be where Google is going, is to collect vast amounts of information about the user's previous behavior, which can be used as additional context for search requests. That's likely to help, but it makes downsides. If everybody gets a different answer when searching for something, you can't tell other people what to search for to find something. Asking the same question again, after doing other things, might get you a different answer. It's probably going to do the wrong thing some of the time. Given the model that "search is a box into which you type in what you want, more or less", that could drive users nuts.
And none of this really applies to shopping-related searches, which aren't formal queries at all.
User generated meta data? (Score:1)
(http://www.getjive.com/)
Probably something more effective would be something a little more complex than just tags. Using the porn example, you wouldn't want articles talking about porn to be blocked (if you were blocking porn) because it actually wasn't porn. So you might have a couple different categories of tags. You might even put in a rating on the content (I'm thinking along the lines of PG, PG-13, R, etc). The validity of certain meta data could be based on the frequency of the reported meta data.
Essentially, it's like a wiki-meta-data system. You could make a great search engine out of it. You could make good content control systems with it. If you made the data available through a web service, you can put the control for its user in the hands of the user. The meta-data rating software wouldn't be for the average joe, but you could motivate people to rate using systems like what's used in the google image labeler http://images.google.com/imagelabeler/ [google.com]. Or you could require the user to rate a page to "pay" for each search they do. People could also submit their site to be rated.
It would probably be hard to get wide participation, but it would cool if it could be done.
-br
The Quantum Bookkeepers (Score:1)
Charlotte's Semantic Web (Score:1)
(http://www.privatejetsalesandrental.com/)
Re:Nonsense (Score:2, Funny)
(http://silmaril.ie/cgi-bin/blog)
Re:metadata worst idea ever (Score:1)
Re:metadata worst idea ever (Score:5, Informative)
Re:metadata worst idea ever (Score:2, Interesting)
(http://teethgrinder.co.uk/open-flash-chart/)
Semantic Web = the promise that never quite delivers
Such a good idea in theory, but where does trust come from? Who can we trust to mark anything?
And by the time any of this is solved google will have evolved so it can understand plain text better than mark up. How do you markup something as ambiguous? Unsure? Rumor? It's pretty easy in plain English:
"I hear Joe is living in Cornwall". There you go, easy to use and no angle brackets.
monk.e.boy
Re:Who actually asks search engines questions? (Score:1)
(http://www.ronpaul2008.com/)
For example, google "USA", vs. "Where are the USA?" and you will get different results. If you really wanted to know where the USA are, the second query will be far more useful, giving you the desired information in the first link.
The "according to" links seem to be more sensitive to natural speech.