Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
Check out the new SourceForge HTML5 internet speed test! No Flash necessary and runs on all devices. ×
The Internet

Semantic Search Points To Better Relevancy 90

ReadWriteWeb writes in to tell us about an article by Dr. Riza C. Berkan, founder and CEO of hakia.com, describing the promise of and potential for semantic search. This approach to providing more on-target search results contrasts with the dream of the semantic Web. Semantic search doesn't require all the Web page authors in the world to begin adding metadata; but it's not a sure thing that the researchers now developing the idea will get it right.
This discussion has been archived. No new comments can be posted.

Semantic Search Points To Better Relevancy

Comments Filter:
  • by javilon ( 99157 ) on Wednesday May 30, 2007 @05:03AM (#19319361) Homepage
    From TFA:

    "There are so many ways of doing it improperly, and only one way of doing it right."

    But he doesn't say what the right way is, or how it could be, or even if he thinks his company is on the right track. There is no information at all.
    • Re: (Score:1, Insightful)

      by suv4x4 ( 956391 )

      But he doesn't say what the right way is, or how it could be, or even if he thinks his company is on the right track. There is no information at all.


      Why, what did you expect, a link to their full source code? The article's about the direction the engines are taking, the way those appear in userland. If you'd ask Google about specifics in their algorithm, they'll also be quite silent all of a sudden.
      • by Rei ( 128717 )
        What we really need is for Wikipedia to move over to Semantic MediaWiki [ontoworld.org]; it should be a painless transition. I really think that it would be widely used -- once people see it in use in some articles, they're more likely to use it in other articles, in the same way that people learn most of Wikipedia's formatting. With wide use of semantic tags (esp. if an ontology was used as well), the entire knowledge base of Wikipedia could be intelligently queried. Want to know all trees that can grow to more than 60
    • I'm not so sure this guy is bright enough to come up with a right answer.

      How does this guy know there is only one solution?

      It may be that there are an infinite number of right solutions. Or it may be that there are a dozen right solutions. It's very rare to find a problem in the universe that has one and only one right solution. It could even be that there is not right solution. In which case, mathematics can come to the rescue, yet again, and provide us with a very large number of solutions approaching
  • ...the best example/s they know of a definition (or better still a demonstration) of "social search." Thanks much.
    • by regular_gonzalez ( 926606 ) on Wednesday May 30, 2007 @06:17AM (#19319675)
      MovieLens [umn.edu] is perhaps kind of similar-but-different. You go there and rate movies. Based on similarities to how other people rated movies, it then suggests movies for you and your likely rating of them. It's pretty neat actually -- my wife and I both have accounts there, and you can cross-reference with other people. So now when we go to the video store, instead of each of us picking one movie we like and potentially forcing the other person to suffer through it, we can find a movie that (in theory) we will both like. Seems fairly accurate so far.
      • Re: (Score:3, Insightful)

        by srussell ( 39342 )

        instead of each of us picking one movie we like and potentially forcing the other person to suffer through it, we can find a movie that (in theory) we will both like.

        If it weren't for my wife, my media consumption would consist entirely of science fiction and WWI/II movies; thanks to my wife, I've been exposed to a much broader swath of media genres -- some of which has been painful, and some of which I've regretted... but in the balance, I think I'm a better person for it. But, then, I possess an abunda

    • by pffft ( 1103639 )
      www.pandora.com

      Along the same lines, except for music. You rate songs and then get recomendations based on the characteristics of the music you rate highly. An interesting idea, but not 100% effective in my opinion.

  • by Max Romantschuk ( 132276 ) <max@romantschuk.fi> on Wednesday May 30, 2007 @05:11AM (#19319389) Homepage
    The semantic web is about more than search. Rich semantics will enable applications of a completely different nature than today. Aggregating and mashing up data could be taken to a whole new level. Just because someone comes up with better indexing we shouldn't give up on the semantic web.

    Just my 2 cents, anyway.
    • by kahei ( 466208 ) on Wednesday May 30, 2007 @05:54AM (#19319571) Homepage

      Honestly, if some Marxist state from the 60s produced propaganda like that, everyone would laugh:

      "The People's Revolution is about more than nationalism! New communal agricultural techniques will enable a standard of living of a completely different nature than today! Manufacturing and distributing goods for the Workers could be taken to a whole new level!"

      It's the same fallacy: "If only everyone spontaneously got together and did what I think they should, all problems would go away!"

      Yet just because the fictional utopia in question is the 'Semantic Web' rather than the 'Workers Paradise', everybody takes it really seriously. And nobody mocks it at all. Nope, nobody ever laughs at the Semantic Web.

      Ok, ok, I'm just being mean, I should go and do something useful.

      • Re: (Score:3, Insightful)

        Ok, ok, I'm just being mean, I should go and do something useful.

        No. Actually, you're being accurate. Unless folks can solve the multiple taxonomy problem (and, no, deciding on a common taxonomy and taxonomy translation approaches have not worked in the past) and the metadata cheating problem, the "Semantic Web" is BS promulgated by someone who probably doesn't know the history of epistemology, taxology, or why hard AI problems really are hard, even if he has been knighted. And the people who think that

    • by jfengel ( 409917 )
      In fact, Semantic Web isn't even vaguely about search. Semantic Web doesn't index text. It's much closer to a database, with a stronger ability to define relationships between fields than you can do with data schemas. (It's the sort of work you used to have to do with SQL, and some capabilities you couldn't do with SQL.)

      So it's Web only in the sense that we're sharing data over port 80; it's not any sort of add-on to HTML. As for Semantic... well, we can debate FOL vs DL vs whatever you want in a differ
  • by DrSkwid ( 118965 ) on Wednesday May 30, 2007 @05:19AM (#19319431) Homepage Journal
    Hear the outlandish claims ladies and gentlemen, of how the brave doctor wants us just to have better searches.
  • by FredDC ( 1048502 ) on Wednesday May 30, 2007 @05:31AM (#19319493)
    IMHO semantics don't work on a global scale, it does work if you only check trusted sources. If everyone can create data and place semantics on it, it becomes useless. You can't trust everyone to place correct semantics on it, either they don't have the knowledge to place correct semantics on data, or they maliciously place the wrong semantics on it.
    • Re: (Score:1, Funny)

      by Anonymous Coward
      Lets see how many buzzwords I can crame into a single sentence...

      Okay, so lets mashup this semantic idea with the whole social news concept to create a innovative, synergized social semantic system.
    • by PPH ( 736903 )
      They do if you are searching for pr0n. Any key word you can think of will lead you to an XXX site.
      • They do if you are searching for pr0n. Any word you can think of will lead you to an XXX site.

        fixed that for you...
    • IMHO semantics don't work on a global scale, it does work if you only check trusted sources.

      The first part is not true, the second part is. Of course, one of the key applications for semantic technology is "web of trust" kind of systems that provide the infrastructure for dealing with the question "who is a trusted source and to what degree?"

      There is no requirement that semantic tags from different sources be treated equally (and the distinction isn't just between "trust" and "ignore", you can do a lot more

    • Re: (Score:3, Insightful)

      by epine ( 68316 )

      This society goes to great lengths to cultivate learned helplessness. Attitudes toward brands are a good example. Many people wish to simplify their decision making by forming an emotional bond with their favorite brands, rather than exercising rational judgement, which involves wading into the frustrations involved in finding information you can trust about the products you wish to purchase.

      I no time for Sanger, either, who is busy trying to brand knowledge with the warm glow of credentialed expertise.

      If
  • That's good (Score:5, Interesting)

    by suv4x4 ( 956391 ) on Wednesday May 30, 2007 @05:52AM (#19319565)
    While this is not strictly PR piece for Hakia.com, it mentions the site (and some others) and I just to try it. I gotta be honest, it does produce more interesting results than Google in some cases (i.e. more accurate). While in others it produces worse results. But the company's young.

    Overall, this is the direction we should be taking. The semantic web is indeed just that: a shiny dream.

    Today, we're talking about anyone having the ability to create a web page, using pre-made online page/blog tools, or easy to use WYSIWYG desktop apps.

    You can't ask of people who can't make the difference between typing a query in the search engine and typing an URL in the address bar, to add proper meta information on his blog. Not to mention the abuse potential.

    I can already hear someone saying "If you don't know the XHTML/CSS specs by heart you shouldn't be making pages" but that's just arrogant. Technology should destroy barriers, not create them, the technology which implements this idea better, will succeed. Look at Google: it will parse even the most horrendous code and extract proper information for it. This is why they are number 1.

    BTW, Google already extracts semantic information from both the site and query, but this quite primitive compared to the potential mentioned in the article. Google looks for term context, meaning context, synonyms, related words etc. I hope Hakia.com and businesses like them take this idea further, so there's finally some innovation happening in search (something that only enjoyed gradual and miniscule improvements for the last 9 years, since Google introduced pagerank).
    • by Yoozer ( 1055188 )

      I can already hear someone saying "If you don't know the XHTML/CSS specs by heart you shouldn't be making pages" but that's just arrogant.

      No, it's overdone, but certainly not arrogant. Knowing not everything by heart is not a problem when there's a reference nearby, as long as you consistently follow it (which is why it helps to know stuff by heart so you don't have to look it up. Even then, in the end it makes your job easier instead of harder; CSS saves you a lot of headaches with consistency in design

      • Re: (Score:3, Interesting)

        by suv4x4 ( 956391 )
        Adhering to standards and accessibility may give you the edge in the business while letting a hundred monkeys bang away in Frontpage '97 won't. It's probably more arrogant to say you don't need that edge.

        There are two things here: actually there isn't a "business" behind every page. This is like saying we should all have proper automated phone answer systems on our phones, as this gives us edge in our business: but phones are used for more than business, and I certainly don't need all those fancy things on
        • by Yoozer ( 1055188 )

          There are two things here: actually there isn't a "business" behind every page.

          Agreed - there isn't. But what's the goal of the business on the web? To get attention. That is exactly the same goal of that large number of people who make everything themselves - the difference being that they aren't designers, SEO specialists, server-side scripters or what-have-you and have to become a jack of all trades in the time it takes to browse through a Teach-Yourself-X-in-Y-minutes. You want your little place to be

    • Re: (Score:3, Interesting)

      by fermion ( 181285 )
      The points are valid within a certain context, but we have to define what that context is. First, who is going to pay for the service. Second, who is going to use the service. Third how is the service actually going to be built. Fourth how is the profit going to be derived.

      In the Google model, advertising pays the bill, the masses use it, the service is built on sound statistical principles, and profit is driven by focusig on making the process relatively simple and cheap. The web is crawled, links ar

    • by hoojus ( 935220 )

      You can't ask of people who can't make the difference between typing a query in the search engine and typing an URL in the address bar, to add proper meta information on his blog.
      But that is a good thing as the semantic search won't return their blog... and really do you want to read the blog of a person who types queries in the address bar?

    • Overall, this is the direction we should be taking. The semantic web is indeed just that: a shiny dream.

      The kind of "semantic search" laid out in the paper is at least as much of a "shiny dream" as the Semantic Web pretty much by definition. The kind of "semantic search" laid out in this paper requires an extreme version of exactly the same technology that would be used by a "semantic factory" that would take user-created content and add semantic markup automatically, the only difference is that instead of

  • by spectrokid ( 660550 ) on Wednesday May 30, 2007 @05:54AM (#19319573) Homepage
    Yes, people will abuse it in any way they can. Mostly to try and get higher up in the search engines. But this does not mean it is by definition useless. It is useless to do ranking, but once you (the search engine) have decided to list a site, you could use the metadata for semantic web-stuff. How about allowing for a physical address, phone number, opening hours (for brick & mortar )... This would e.g. allow for a "copy address to contacts" button. Make an easy (web based) program to generate the HTML so mom&pop shops can include it tin their website, and refrain from using it for ranking purposes, and you should be ok.
    • by suv4x4 ( 956391 )
      but once you (the search engine) have decided to list a site, you could use the metadata for semantic web-stuff

      Yup.. this is how microformats work. Something which a lot of the top companies seem to be interested in (including Microsoft).
    • by maxume ( 22995 )
      As people figure out that adding metadata makes their data more useful for themselves, they will add more and more of it. If it's out there, it will get used.
  • [..] but it's not a sure thing that the researchers now developing the idea will get it right.
    Well, is there anyone in their Friend Of A Friend RDFweb [rdfweb.org] that might know how to get it right?
  • Tiresome and wrong (Score:5, Insightful)

    by dread ( 3500 ) on Wednesday May 30, 2007 @06:07AM (#19319631)
    There is a huge problem with the argument made in the article - one which is plainly visible in the "Palladium" example. The meaning of "Palladium" is related to an internal state (i.e. my internal state). What am *I* thinking about when I write "Palladium"? Am I referring to the element Palladium? Am I referring to the DRM technologies from Microsoft? This is dependent on three things primarily:
    1: my "role". What am I? Am I a journalist at a newspaper? Am I a private citizen with a large collection of illgotten mp3s?
    2: my "context". Am I discussing something? Is this a query related to a conversation I am having with someone else? God only knows how many Google queries actually stem from ongoing IM-conversations where a, to the reader, previously unknown term/subject is brought forward.
    3: my "personality". What am I primarily interested in? What is my preferred format of consumtion? If I am 7 years old - what the hell does "Palladium" really mean?

    To me it is obvious that the idea of a semantic web, the promise if you will, can never be delivered upon without a framework that is usercentric rather than centralistic in the current Googlefashion. Desktop search is interesting to some extent as a way of tying our personal space with the dataspace outside of our local control but that is still a very limited tool. Since much of what is very simplistically covered in 1 and 2 above is related to interpersonal communication it becomes obvious that what is necessary is data structures that learn from ongoing conversation, eg the intersection of Person A and Person B is described in a way that can give us guidance as to what the appropriate (or most likely) interpretation of the term used is.

    There is much that can be said about this but suffice to say that the semantic web people are ignoring the real needs that have to be met in order to create something that is truly semantic and carries a knowledge of what the end user actually intends. Because if we don't understand the intent, we don't really understand anything.
    • Re: (Score:1, Interesting)

      by Anonymous Coward
      I got my master's at one of the schools getting the bulk of the research money, and we made that same argument there, to deaf ears. Namely that students and professors were solving the easy "peripheral" problems related to semantic web, and just ignoring the 13,125,732-lb gorillas in the room.

      • Re: (Score:2, Interesting)

        by illaqueate ( 416118 )
        Yeah, pretty much. I set out to make a data assistant program in high school (c 1996-1999) and was thinking about how to get a correspondence between what I was thinking and how data would be retrieved and figured it would have to be so generic to be worthless. And then I read Hilary Putnam's Representation and Reality and felt sick about the entire thing. But now that I think back on it I did have a lot of fun testing out different kinds of data retrieval on structured and unstructured data (and thinking u
    • Re: (Score:3, Insightful)

      by PPH ( 736903 )
      Well, humans don't understand intent. They have to ask. Call up a reference librarian and ask for information on "Palladium". Odds are s/he will reply with one or more questions. I don't expect a semantic search engine to do any better.

      There are two parts to this problem. The UI, or how a user will interact with the system to describe the context within which a search is to be performed, and the web crawler, which must extract semantics from web pages based on either metadata, linking algorithms (ala Goog

      • Call up a reference librarian and ask for information on "Palladium". Odds are s/he will reply with one or more questions.

        That's because librarians are slow. Those questions can lead to saving a great deal of time off the top.

        A search engine is fast, effectively providing a ton of answers in seconds or fractions of seconds. The problem then is that we are slow. We can't go through all the hits as fast as the search engine spits them out.

        What would be helpful would be if the search engine clustered results as if in response to the sorts of questions our hypothetical librarian might ask. The Clusty search engine attempts

      • Re: (Score:2, Insightful)

        by dread ( 3500 )
        Humans certainly understand intent. They will - as you point out - ask if they don't know the intent. You always know what you intend. If someone you know asks you a question, chances are you will have enough commonality, so to speak, to intuitively grasp the intent (or context). Your example with the librarian is interesting but pointless since you are talking about another centralised knowledge solution whereas I am talking about a decentralised model that starts with the user and - if you will - a "conte
        • by PPH ( 736903 )
          Assuming the pre-existance of a shared 'context model' is cheating, sort of. The reference librarian example is valid from the point of view of having to establish this context model upon contacting the librarian 'cold' so to speak. The exchange that must occur when you contact this librarian, or any other human may seem trivial. But for an API to a semantic database, centralized or otherwise, this exchange must be formalized. Once that's done, semantic processing isn't terribly difficult. It has been a so
    • There is a huge problem with the argument made in the article - one which is plainly visible in the "Palladium" example. The meaning of "Palladium" is related to an internal state (i.e. my internal state). What am *I* thinking about when I write "Palladium"? Am I referring to the element Palladium? Am I referring to the DRM technologies from Microsoft? This is dependent on three things primarily:
      1: my "role". What am I? Am I a journalist at a newspaper? Am I a private citizen with a large collection of illgotten mp3s?
      2: my "context". Am I discussing something? Is this a query related to a conversation I am having with someone else? God only knows how many Google queries actually stem from ongoing IM-conversations where a, to the reader, previously unknown term/subject is brought forward.
      3: my "personality". What am I primarily interested in? What is my preferred format of consumtion? If I am 7 years old - what the hell does "Palladium" really mean?

      the main point of the article is that semantics offers improved searching over link statistics, and i'm not sure how any of this relates to that since none of this context is being supplied or used by a search engine like Google either. but like you said it's interesting that a lot of the questions above could be answered with a semantic search of the user's own hard drive, which would be probably more useful than a link statistics index of that same device.

      but doing a search on your own machine and pro

    • by nuzak ( 959558 )
      How's this differ from humans? If you asked me what I think of Palladium, I'd say they made some pretty fun RPG's. A semantic web search will at least be able to separate the distinct definitions from each other, which is something you don't get with current lexical searches.
  • by robably ( 1044462 ) on Wednesday May 30, 2007 @06:28AM (#19319711) Journal
    Quick! Tag this story as "Goldfish" and "Hairdressing".
  • by Anonymous Coward
    Is 'Semantic Web' already included in Web 2.0? Or will that be the 3.0 version?

    BWAAAHAHAHAAAAHAAA
  • Look, I'm not interested the academic niceties of semantic searching and metadata. What I want, would would actually be useful, would be a way to separate out the low value sites from those of relevance FOR THE SEARCH I'M DOING.

    If I'm looking for a review of a product, I don't want 50 shops trying to sell me one. If I'm looking for a explanation/definition of a term, I don't want a page that may mention the word, even if it does have high page rank. If I'm looking for a site that gives me lots of links to

    • Re: (Score:2, Interesting)

      by msporny ( 653636 ) *

      If you are interested in real solution to semantic web markup that works (and is being used) right now, you might want to check out the Microformats website [microformats.org]. There is a growing following that is working on getting the semantic web working properly. The Firefox and Songbird guys are looking at using Microformats to make browsing the web a much richer experience - NOW, not 10 years from now.

      There are currently Microformats for marking up people, places, events, geographic locations, music, and many other wi

  • This sounds like yet another company doing something like Latent Semantic Indexing [wikipedia.org] or some sort of context processing on the text rather than using RDF markup to decide the semantics. To me, this isn't the semantic web..just another fancy search company trying to jump on the bandwagon.
  • Is there a relevance to the display of ignorance in the title?
  • Together with a friend from Caltech, I've helped create a social content network for food information which supports semantic search for food information. For example, you can go to efoodi.com and search for 'meat', 'vegetable', or 'Mediterranean' to get a glimpse of the concepts it understands. It also supports social search and tag-based browsing. These technologies are powerful and it's surprising they're not more commonplace on the web.
  • This approach to providing more on-target search results contrasts with the dream of the semantic Web. Semantic search doesn't require all the Web page authors in the world to begin adding metadata;

    In which respect it does not contrast with the Semantic Web, which doesn't require that any more than the regular Web required every computer attached to the internet to start running a web server. Since this article wasn't about the semantic web to start with, was an inaccurate gratuitous attack on the Semantic

  • Some (if not all) of the concept relation semantics needed for doing "semantic search"
    or "machine comprehension" of text on the web can be gleaned by
    doing statistical analysis of the relationships between words and phrases
    across the entire web. Aggregating across a large corpus eliminates "noise"
    in usage and draws out the semantic "signal" about how people relate the
    concepts to each other.
  • "There are so many ways of doing it improperly, and only one way of doing it right."

    That is a fallacy - you can not know that there is 'only one way of doing it right', if you don't know what that 'right' way is to begin with - you are dealing with unknowns. In truth there are very few systems that collapse to a solution set of one. The phrase 'there is more than one way to skin a cat' comes to mind. This one statement tells me the person making the statement is more interested in controlling the method,
  • "Semantic search" is actually a dumbed-down version of what, in AI, used to be called "natural language question answering systems". The first one that was sort of useful was Bobrow's "Baseball" [atariarchives.org], which, unlike Eliza, actually did something useful. "Baseball" had a small database of baseball statistics, and could answer questions like "How many games did the Orioles play in June?". I'm surprised that someone doesn't have a natural language query system for sports statistics on the web today. It's not ou

    • by msbmsb ( 871828 )
      "Used to be called 'natural language question answering systems'"? NLP and Question Answering are still very very active fields of research with many conferences, workshops and evaluations going on - not only in the US but also internationally - encompassing multi-lingual QA and reasoning-based QA. Ask Jeeves was not real QA, it was based more on manual annotations than open-domain NLP.
  • I think that maybe the community could come up with some kind of user-generated meta data system. For example, some one could create a site similar to StumbleUpon, but have it be just a general meta-data service. So when you visit a page, if you feel like it, you can tag it with certain meta data. This could be helpful, for example, in blocking AND finding porn.

    Probably something more effective would be something a little more complex than just tags. Using the porn example, you wouldn't want articles talkin
    • Probably something more effective would be something a little more complex than just tags. Using the porn example, you wouldn't want articles talking about porn to be blocked (if you were blocking porn) because it actually wasn't porn. So you might have a couple different categories of tags.

      You mean, you could have something represented like:
      -
      subject: http://www.somewebsite.com/ [somewebsite.com]
      predicate: contains
      object: pr0n

      -
      subject: http://www.otherwebsite.com/ [otherwebsite.com]
      predicate: discusses
      object: pr0n

      • Thanks, you're absolutely right. I will admit, I'm not all that familiar with proposals surrounding the semantic web (I just looked up RDF [wikipedia.org]) and it looks like RDF is what I'm looking for.

        In terms of the idea, I was also thinking that even better than just getting input from users, you could do a hybrid, which is have the content provider provide the meta data and have it verified by the users. It would probably get adopted faster. Also, the search engine could even penalize sites if their self-provided meta
  • Chapter 2007 Ingrid 7.3.01 Graphics processing is based on a linear database kernel re-engineered from Patrick Slater's psychological repertory grid subroutine of the same name. Ingrid v7.3 will hopefully lay semantic long-tail search plans to put a dynamically flexible, graphically acoustic, externally scheduled version of the RadioChomsky4pp.exe into a global grid computer. This and the instructions to get the latest Ingrid On Winamp software are ready for download now at http://ingridx.dyndns.org/do [dyndns.org]
  • A search on Charlotte's semantic web turned up "SOME PIG", whose real name was Wilbur, a sweet little porker who the locals grew very fond of, especially as he brought fame (and a bit of fortune) to their little town. Thanks to Wilbur's great and true friend Charlotte, Wilbur's essence of character was boiled down to one short phrase, making the search results highly relevant and easily accessible by all of God's creatures -- including spiders, of course. E.B White's creative mind gave us a fascinating ch

Hard work never killed anybody, but why take a chance? -- Charlie McCarthy

Working...