Slashdot Log In
Semantic Search Points To Better Relevancy
Posted by
kdawson
on Wed May 30, 2007 04:44 AM
from the retrieve-what-I-mean dept.
from the retrieve-what-I-mean dept.
ReadWriteWeb writes in to tell us about an article by Dr. Riza C. Berkan, founder and CEO of hakia.com, describing the promise of and potential for semantic search. This approach to providing more on-target search results contrasts with the dream of the semantic Web. Semantic search doesn't require all the Web page authors in the world to begin adding metadata; but it's not a sure thing that the researchers now developing the idea will get it right.
Related Stories
[+]
Technology: Why the Semantic Web Will Fail 179 comments
Jack Action writes "A researcher at Canada's National Research Council has a provocative post on his personal blog predicting that the Semantic Web will fail. The researcher notes the rising problems with Web 2.0 — MySpace blocking outside widgets, Yahoo ending Flickr identities, rumors Google will turn off its search API — and predicts these will also cripple Web 3.0." From the post: "The Semantic Web will never work because it depends on businesses working together, on them cooperating. There is no way they: (1) would agree on web standards (hah!) (2) would adopt a common vocabulary (you don't say) (3) would reliably expose their APIs so anyone could use them (as if)."
[+]
Technology: Tim Berners-Lee Discusses the Future of the Web 112 comments
maximus1 writes "In an interview with IT World, Tim Berners-Lee explains his vision of the Semantic Web. He says: 'The Semantic Web is going to take off particularly when we see people using it for data processing, when we see people using it in more and more things, adding personal data, adding files to government data.' His position on net neutrality: 'We've seen cable companies trying to prevent using the Internet for Internet phones. I am concerned about this, and am working, with many other committed people, to keep it from happening. I think it's very important to keep an open Internet for whoever you are. This is called Net neutrality. It's very important to preserve Net neutrality for the future.' And a fun tidbit — He mentions his 1989 memo to his boss at CERN that described his vision for the Web."
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
So what does he offer? (Score:5, Interesting)
"There are so many ways of doing it improperly, and only one way of doing it right."
But he doesn't say what the right way is, or how it could be, or even if he thinks his company is on the right track. There is no information at all.
Re: (Score:1, Insightful)
But he doesn't say what the right way is, or how it could be, or even if he thinks his company is on the right track. There is no information at all.
Why, what did you expect, a link to their full source code? The article's about the direction the engines are taking, the way those appear in userland. If you'd ask Google about specifics in their algorithm, they'll also be quite silent all of a sudden.
Re: (Score:2)
Would someone please cut and paste here... (Score:2)
Re:Would someone please cut and paste here... (Score:4, Interesting)
Parent
Re: (Score:3, Insightful)
If it weren't for my wife, my media consumption would consist entirely of science fiction and WWI/II movies; thanks to my wife, I've been exposed to a much broader swath of media genres -- some of which has been painful, and some of which I've regretted... but in the balance, I think I'm a better person for it. But, then, I possess an abunda
Re: (Score:2)
The semantic web is still a Good Thing (Score:4, Interesting)
Just my 2 cents, anyway.
Re:The semantic web is still a Good Thing (Score:5, Insightful)
Honestly, if some Marxist state from the 60s produced propaganda like that, everyone would laugh:
"The People's Revolution is about more than nationalism! New communal agricultural techniques will enable a standard of living of a completely different nature than today! Manufacturing and distributing goods for the Workers could be taken to a whole new level!"
It's the same fallacy: "If only everyone spontaneously got together and did what I think they should, all problems would go away!"
Yet just because the fictional utopia in question is the 'Semantic Web' rather than the 'Workers Paradise', everybody takes it really seriously. And nobody mocks it at all. Nope, nobody ever laughs at the Semantic Web.
Ok, ok, I'm just being mean, I should go and do something useful.
Parent
Re: (Score:3, Insightful)
No. Actually, you're being accurate. Unless folks can solve the multiple taxonomy problem (and, no, deciding on a common taxonomy and taxonomy translation approaches have not worked in the past) and the metadata cheating problem, the "Semantic Web" is BS promulgated by someone who probably doesn't know the history of epistemology, taxology, or why hard AI problems really are hard, even if he has been knighted. And the people who think that
Re: (Score:2)
So it's Web only in the sense that we're sharing data over port 80; it's not any sort of add-on to HTML. As for Semantic... well, we can debate FOL vs DL vs whatever you want in a differ
Man promotes own company (Score:3, Insightful)
Semantics don't work on a global scale (Score:4, Insightful)
Re: (Score:1, Funny)
Okay, so lets mashup this semantic idea with the whole social news concept to create a innovative, synergized social semantic system.
Re: (Score:2)
Re: (Score:2)
The first part is not true, the second part is. Of course, one of the key applications for semantic technology is "web of trust" kind of systems that provide the infrastructure for dealing with the question "who is a trusted source and to what degree?"
There is no requirement that semantic tags from different sources be treated equally (and the distinction isn't just between "trust" and "ignore", you can do a lot more
Re: (Score:3, Insightful)
This society goes to great lengths to cultivate learned helplessness. Attitudes toward brands are a good example. Many people wish to simplify their decision making by forming an emotional bond with their favorite brands, rather than exercising rational judgement, which involves wading into the frustrations involved in finding information you can trust about the products you wish to purchase.
I no time for Sanger, either, who is busy trying to brand knowledge with the warm glow of credentialed expertise.
If
That's good (Score:5, Interesting)
Overall, this is the direction we should be taking. The semantic web is indeed just that: a shiny dream.
Today, we're talking about anyone having the ability to create a web page, using pre-made online page/blog tools, or easy to use WYSIWYG desktop apps.
You can't ask of people who can't make the difference between typing a query in the search engine and typing an URL in the address bar, to add proper meta information on his blog. Not to mention the abuse potential.
I can already hear someone saying "If you don't know the XHTML/CSS specs by heart you shouldn't be making pages" but that's just arrogant. Technology should destroy barriers, not create them, the technology which implements this idea better, will succeed. Look at Google: it will parse even the most horrendous code and extract proper information for it. This is why they are number 1.
BTW, Google already extracts semantic information from both the site and query, but this quite primitive compared to the potential mentioned in the article. Google looks for term context, meaning context, synonyms, related words etc. I hope Hakia.com and businesses like them take this idea further, so there's finally some innovation happening in search (something that only enjoyed gradual and miniscule improvements for the last 9 years, since Google introduced pagerank).
Re: (Score:3, Interesting)
In the Google model, advertising pays the bill, the masses use it, the service is built on sound statistical principles, and profit is driven by focusig on making the process relatively simple and cheap. The web is crawled, links ar
Re: (Score:2)
The kind of "semantic search" laid out in the paper is at least as much of a "shiny dream" as the Semantic Web pretty much by definition. The kind of "semantic search" laid out in this paper requires an extreme version of exactly the same technology that would be used by a "semantic factory" that would take user-created content and add semantic markup automatically, the only difference is that instead of
Re: (Score:3, Interesting)
There are two things here: actually there isn't a "business" behind every page. This is like saying we should all have proper automated phone answer systems on our phones, as this gives us edge in our business: but phones are used for more than business, and I certainly don't need all those fancy things on
in the defense of meta-data (Score:5, Interesting)
Re: (Score:2)
Yup.. this is how microformats work. Something which a lot of the top companies seem to be interested in (including Microsoft).
Looking for Mr/Ms Right (Score:1)
Tiresome and wrong (Score:5, Insightful)
1: my "role". What am I? Am I a journalist at a newspaper? Am I a private citizen with a large collection of illgotten mp3s?
2: my "context". Am I discussing something? Is this a query related to a conversation I am having with someone else? God only knows how many Google queries actually stem from ongoing IM-conversations where a, to the reader, previously unknown term/subject is brought forward.
3: my "personality". What am I primarily interested in? What is my preferred format of consumtion? If I am 7 years old - what the hell does "Palladium" really mean?
To me it is obvious that the idea of a semantic web, the promise if you will, can never be delivered upon without a framework that is usercentric rather than centralistic in the current Googlefashion. Desktop search is interesting to some extent as a way of tying our personal space with the dataspace outside of our local control but that is still a very limited tool. Since much of what is very simplistically covered in 1 and 2 above is related to interpersonal communication it becomes obvious that what is necessary is data structures that learn from ongoing conversation, eg the intersection of Person A and Person B is described in a way that can give us guidance as to what the appropriate (or most likely) interpretation of the term used is.
There is much that can be said about this but suffice to say that the semantic web people are ignoring the real needs that have to be met in order to create something that is truly semantic and carries a knowledge of what the end user actually intends. Because if we don't understand the intent, we don't really understand anything.
Re: (Score:3, Insightful)
There are two parts to this problem. The UI, or how a user will interact with the system to describe the context within which a search is to be performed, and the web crawler, which must extract semantics from web pages based on either metadata, linking algorithms (ala Goog
Librarians are slow (Score:2)
Call up a reference librarian and ask for information on "Palladium". Odds are s/he will reply with one or more questions.
That's because librarians are slow. Those questions can lead to saving a great deal of time off the top.
A search engine is fast, effectively providing a ton of answers in seconds or fractions of seconds. The problem then is that we are slow. We can't go through all the hits as fast as the search engine spits them out.
What would be helpful would be if the search engine clustered results as if in response to the sorts of questions our hypothetical librarian might ask. The Clusty search engine attempts
Re: (Score:2, Insightful)
Re: (Score:2)
Re: (Score:2)
There is a huge problem with the argument made in the article - one which is plainly visible in the "Palladium" example. The meaning of "Palladium" is related to an internal state (i.e. my internal state). What am *I* thinking about when I write "Palladium"? Am I referring to the element Palladium? Am I referring to the DRM technologies from Microsoft? This is dependent on three things primarily:
1: my "role". What am I? Am I a journalist at a newspaper? Am I a private citizen with a large collection of illgotten mp3s?
2: my "context". Am I discussing something? Is this a query related to a conversation I am having with someone else? God only knows how many Google queries actually stem from ongoing IM-conversations where a, to the reader, previously unknown term/subject is brought forward.
3: my "personality". What am I primarily interested in? What is my preferred format of consumtion? If I am 7 years old - what the hell does "Palladium" really mean?
the main point of the article is that semantics offers improved searching over link statistics, and i'm not sure how any of this relates to that since none of this context is being supplied or used by a search engine like Google either. but like you said it's interesting that a lot of the questions above could be answered with a semantic search of the user's own hard drive, which would be probably more useful than a link statistics index of that same device.
but doing a search on your own machine and pro
Re: (Score:2)
Re: (Score:2, Interesting)
Semantic Search Points To Better Relevancy (Score:5, Funny)
Availablilty? (Score:2, Funny)
BWAAAHAHAHAAAAHAAA
Missing the target (Score:2)
Look, I'm not interested the academic niceties of semantic searching and metadata. What I want, would would actually be useful, would be a way to separate out the low value sites from those of relevance FOR THE SEARCH I'M DOING.
If I'm looking for a review of a product, I don't want 50 shops trying to sell me one. If I'm looking for a explanation/definition of a term, I don't want a page that may mention the word, even if it does have high page rank. If I'm looking for a site that gives me lots of links to
Re: (Score:2, Interesting)
If you are interested in real solution to semantic web markup that works (and is being used) right now, you might want to check out the Microformats website [microformats.org]. There is a growing following that is working on getting the semantic web working properly. The Firefox and Songbird guys are looking at using Microformats to make browsing the web a much richer experience - NOW, not 10 years from now.
There are currently Microformats for marking up people, places, events, geographic locations, music, and many other wi
doesn't sound like TBL's semantic web to me.. (Score:2)
Semantic Web (Score:2)
In which respect it does not contrast with the Semantic Web, which doesn't require that any more than the regular Web required every computer attached to the internet to start running a web server. Since this article wasn't about the semantic web to start with, was an inaccurate gratuitous attack on the Semantic
Semantics derivable from web corpus statistics (Score:2, Informative)
or "machine comprehension" of text on the web can be gleaned by
doing statistical analysis of the relationships between words and phrases
across the entire web. Aggregating across a large corpus eliminates "noise"
in usage and draws out the semantic "signal" about how people relate the
concepts to each other.
Dogma (Score:2)
That is a fallacy - you can not know that there is 'only one way of doing it right', if you don't know what that 'right' way is to begin with - you are dealing with unknowns. In truth there are very few systems that collapse to a solution set of one. The phrase 'there is more than one way to skin a cat' comes to mind. This one statement tells me the person making the statement is more interested in controlling the method,
Question-answering systems (Score:2)
"Semantic search" is actually a dumbed-down version of what, in AI, used to be called "natural language question answering systems". The first one that was sort of useful was Bobrow's "Baseball" [atariarchives.org], which, unlike Eliza, actually did something useful. "Baseball" had a small database of baseball statistics, and could answer questions like "How many games did the Orioles play in June?". I'm surprised that someone doesn't have a natural language query system for sports statistics on the web today. It's not ou
Re: (Score:2, Funny)
Re: (Score:1)
Re:metadata worst idea ever (Score:5, Informative)
Parent
Re: (Score:2, Informative)
If I have a turd, and I add metadata to it that says its prure gold, it's still a turd; you have to trust me to trust my metadata. That's what the op is talking about, not the container.
Re: (Score:2)
My understanding is that you aren't tagging an item with metadata, rather the search engine is tagging your item with metadata on its end based on the linguistic context of the page. Meaning, based on context, it would understand that there is a difference between the word "server" on a page about restaurants vs. the word "server" on a page about office equipment, so you won't get links to Hooter's and Jimmy's Seafood Hut mixed in with your results for equipment. Ideally, any metadata tags you throw will be
Re: (Score:2, Interesting)
Semantic Web = the promise that never quite delivers
Such a good idea in theory, but where does trust come from? Who can we trust to mark anything?
And by the time any of this is solved google will have evolved so it can understand plain text better than mark up. How do you markup something as ambiguous? Unsure? Rumor? It's pretty easy in plain English:
"I hear Joe is living in Cornwall". There you go, easy to use and no angle brackets.
monk.e.boy
Re: (Score:2, Interesting)
The SW project exists *because* machines are too dumb to read English. Or Chinese. And will probably stay that way for the forseeable future.
So W3C's RDF is positioned half-way between the world of dumb computers and smart people. It structures data in terms of classes and properties, and allows different groups to define sets of class and property names that can be freely mixed together without the need for heavyweight
Re: (Score:2)
Trust comes from the user's decision to trust a particular source of information, the same as anywhere else.
Who can you trust to tell you anything?
"Computers will understand natural language so that specialized vocabularies for interacting with them are no longer necessary or beneficial" has been the le
Re: (Score:2)
You mean, you could have something represented like:
-
subject: http://www.somewebsite.com/ [somewebsite.com]
predicate: contains
object: pr0n
-
subject: http://www.otherwebsite.com/ [otherwebsite.com]
predicate: discusses
object: pr0n