Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Seven Search Engine Evolutions for '07

Posted by CowboyNeal on Sat Dec 16, 2006 10:13 AM
from the new-and-improved dept.
eldavojohn writes "I found a short but interesting list of predicted evolutions of search engines that will most likely be implemented in 2007. While some are vague and obvious like a better human interactive experience, there are others that are worth looking into like alternative means of indexing and using semantics — not keywords — for matching documents. The author of this list is Dr. Riza Berkan, also the author of 'Fuzzy Systems Design Principles.'"
This discussion has been archived. No new comments can be posted.
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • The most relevant search results are the ones I've seen before, else it's called "exploring" not searching.

    I'm tired of having to sift through hundred of SEO rigged garbage sites and/or blogs to find what I'm looking for. When I'm looking for something i'd like to find it. Not some new shittier variation of it.

    Good times.

  • cant be wrong (Score:2, Funny)

    by bedonnant (958404) on Saturday December 16 2006, @10:23AM (#17268620)
    (http://www.etrangementmoelleux.info/)
    an article beginning with a bill gates quote cant be wrong.
  • Slashvertisement (Score:3, Insightful)

    by TodMinuit (1026042) <todminuit@noSPaM.gmail.com> on Saturday December 16 2006, @10:24AM (#17268626)
    Most of the things listed seem to make results more ambigious, not narrow them down.

    Even if you could radically change the way a search engine works, you then face an even bigger task: Forcing users to radically change their searching habits to fit your search engine.

    And what the hell is "QDEXing"? Google reveals nothing, therefore we can conclude it does not, in fact, exist.
  • In 2007? (Score:4, Informative)

    by cp.tar (871488) <cp.tar.bz2@gmail.com> on Saturday December 16 2006, @10:24AM (#17268628)

    Ah, let me just tag this article 'semanticweb'... there, much better now...

    As early as 2007? Now I don't really believe that.

    It may get partially implemented, and probably only in English.
    Maybe Chinese as well.

    Most of the other languages will have to wait for quite a while beforehand...

    Not to say semantic search is a bad idea or anything... I, for one, would like to see some image-, audio- and video-search based on some kind of semantics, not tags and names... but that'll just have to wait.

    • Re:In 2007? by Rakshasa Taisab (Score:2) Saturday December 16 2006, @10:36AM
      • Re:In 2007? by cp.tar (Score:2) Saturday December 16 2006, @10:48AM
    • Re:In 2007? by patro (Score:1) Saturday December 16 2006, @01:25PM
    • Research topic for decades by Beryllium Sphere(tm) (Score:2) Saturday December 16 2006, @03:17PM
  • Putting this kind of emphasis on search is wise. MS knows this and put a vastly improved searching into Vista. Some say its better than other desktop searches. This is similiar to having a good memory in a human being, quite useful in practice.
  • by Overzeetop (214511) on Saturday December 16 2006, @10:42AM (#17268758)
    (Last Journal: Thursday December 09 2004, @09:25AM)
    ...that I was unimpressed when "fussy logic" was a buzzword a decade ago. I do not look forward to it's resurgence in the marketing lexicon.
  • Seven Spam Evolutions for '07.. (Score:1, Informative)

    by Anonymous Coward on Saturday December 16 2006, @10:43AM (#17268768)
    Hmmm.... and these predictions are in a press release from someone with a new search engine which (surprise!) opens shop in '07.

    Can someone tag the article "spam"?
  • by littleark (1040696) on Saturday December 16 2006, @10:56AM (#17268848)
    Yoople! [yoople.net] has already introduced a more engaging human-like search experience and let the people collaborate in order to create a better indexing.

    Ok, someone could say it's the perfect way to permit abuses and lot of work has still do be done, but it's a smart proposal to start from. Don't you think?

    http://www.yoople.net/ [yoople.net]
  • Get your own house in order (Score:5, Interesting)

    by RealSurreal (620564) * on Saturday December 16 2006, @11:01AM (#17268878)
    Let's hope it's better than the author's search engine, hakia.com. Just used it to search for "nike stores in the uk". First result is an etailer in the US, all the others are spam sites. Looks like we've got a long way to go before search engines actually understand what I'm looking for.
  • Semantic Searches? (Score:3, Funny)

    by eebra82 (907996) on Saturday December 16 2006, @11:07AM (#17268904)
    (http://www.insidebet.com/)
    Can somebody please explain what a semantic search would look like? I'm not sure if I understand the meaning.
  • Honestly, I can't see myself NOT using Google in the years ahead. It's become too ingrained in my lifestyle. If I don't know what something means, I google it. In fact, in the rare times that Google is down, I find myself lost and constantly clicking "home" (www.google.com) only to find it doesn't work.
  • The way forward (Score:1)

    by rumplet (1034332) on Saturday December 16 2006, @11:12AM (#17268930)
    (http://www.britwood.co.uk/)
    Clearly the way forward for search is to make an algorithmic search engine, and have it scrape information from a dead human edited directory.

    Google directory. Bringing you the future today.
  • by creimer (824291) on Saturday December 16 2006, @11:12AM (#17268934)
    (http://www.creimer.ws/ | Last Journal: Friday January 26 2007, @12:40PM)
    Why not some "intelligent design" when it comes to search engines? "Spore" looks like a nice game but I wouldn't base a search engine on it since the result set would be too Darwinian. :P
  • by jorghis (1000092) on Saturday December 16 2006, @11:17AM (#17268964)
    "The first time a search engine will let users evaluate answers on the spot by displaying uninterrupted and coherent text snippets, often letting searchers forgo having to click through to links and saving time."

    Doesnt ask.com give you this functionality already?
    • 1 reply beneath your current threshold.
  • That was a pretty good article, even though most of the stuff on there was pretty obvious (for most of us /.'ers) to begin with.

    I think it was only inevitable that internet searching focuses more on the "type as you speak" initiative rather than the older term-by-term searching of the past. This would be great for us, but I really see that the benefits would cater more to the average man/woman who already has a difficult time searching because they are using "the wrong terms."

    I really think that Google will be the first search engine to implement most of these changes, since their user-base and R&D is already above the roof. I think that Microsoft will also implement this soon with Live, since a sizable portion of their research teams are testing searching based on semantics as well.

  • A lot of this is available now (Score:4, Informative)

    by saddino (183491) on Saturday December 16 2006, @11:34AM (#17269050)
    4. The first time that a single query will bring a gallery of
                results equivalent to running multiple queries about the
                meaningful variations of the same topic.

      5. The first time a search engine will let users evaluate answers
                on the spot by displaying uninterrupted and coherent text
                snippets, often letting searchers forgo having to click through
                to links and saving time.


    Both of these have been available for a couple of years: e.g. searching on the single query "semantic web" using CQ web [q-phrase.com], reveals clusters such as these:


    fuzzy sets
    fuzzy systems
    neural networks
    set theory
    soft computing
    aritifical intelligence
    control systems
    expert systems


    And each one of which is linked to a specific page of results using sentences instead of snippets, e.g. for artificial intelligence:


    1. This paper will present the foundations of fuzzy systems...noteworthy objections to its use with examples drawn from current research in the field of artificial intelligence.
    Fuzzy Systems - A Tutorial [austinlinks.com]
    2. The most obvious implementation for the fuzzy logic is the field of artificial intelligence.
    Fuzzy Logic [ufl.edu]
    3. Ultimately it will be demonstrated...fuzzy systems makes a viable addition to the field of artificial intelligence and perhaps more generally to formal mathematics.
    Fuzzy Systems - A Tutorial [austinlinks.com]
    4. The paper gives examples of the fuzzy logic applications with emphasis on the field of artificial intelligence.
    Fuzzy Logic [ufl.edu]
    5. A collection of articles and other technical resources for artificial intelligence.
    PC AI - Fuzzy Logic [pcai.com]


  • Some of these things (1,6) sound pretty specific to technology that the author's company: Hakia is promising to produce this year. Some of these items (2,4,5) are already being performed by major search engines, but are done behind the scenes and are not immediatly obvious to the user. #2 Will continue to be perfected over thext 20 years, not the next 12 months. #3 Sounds like a reasonable extension to the traditional practice of bolding keywords. I'd like to see this implemented, though I think it will only come with good progress in the area of #2. #7 Is actually a pretty good insight into the way top engines will use thier computing power after they've already crawled and indexed in the standard fashion, most of the 15B good pages on the current web.
  • by whitroth (9367) on Saturday December 16 2006, @12:32PM (#17269540)
    (http://home.cfl.rr.com/diehardanddragon/)
    I know from an ex-wife, a librarian, that librarians have been doing searches for fifteen or twenty years using such constuctions as NEAR . None of the popular search engines, from google on down, do this. It would *certainly* make my life easier, and result in relevant hits, rather than 100k hits because some asshole advertisers have thrown a laundry list into their META tag.

                    mark
    • 1 reply beneath your current threshold.
  • Weighted sorting is all I want (Score:5, Interesting)

    by foniksonik (573572) on Saturday December 16 2006, @12:37PM (#17269580)
    (http://www.emenoh.com/ | Last Journal: Monday April 17 2006, @10:08PM)
    I just want to tell the engine that keyword 1 is 5 times as important as keyword 2

    Give me a slider control that instantly filters the results... ie: have the first 100 results waiting for me with 20 showing, then let me adjust the weight of my keywords until I get the list I am looking for with individual items falling off or being added to the list as I adjust the controls.

    • 1 reply beneath your current threshold.
  • It Is About Time (Score:4, Insightful)

    by BoRegardless (721219) on Saturday December 16 2006, @12:43PM (#17269620)
    I am sick of getting 100,000 irrelevnt hits & then doing dozens of narrowing searches, only to find that the word & phrase hits are all in different paragraphs or even unrelated articles on the same page.

    Bring it on NOW.
  • Press Release (Score:2)

    by BobGregg (89162) on Saturday December 16 2006, @12:53PM (#17269738)
    (http://www.bobgregg.com/)
    Good grief... it's not a scientific paper, it's not a journalist's article, it isn't any meaningful content at all - it's a press release. Right off of BusinessWire. What's next, Ron Popeil's predictions for 2007?
  • by victorvodka (597971) on Saturday December 16 2006, @01:02PM (#17269834)
    (http://asecular.com/)
    When I go to do a search on my computer, I'm comforted by that little doggy. I wish Google would follow Microsoft's example and replace the little box to type in a search query with an animated animal, something with more limbs for going out on those internets and finding stuff.
  • by (Robo_Bro) (1009507) on Saturday December 16 2006, @01:17PM (#17269998)
    People have been futzing with the concept through data-mining for years...it's about time it went into a search engine. And what an engine! I tried it out, and compared it to Google (I had to re-enter the search query in a different syntax to get relevant results) - and I found it more useful! Who knew? Two bones to pick: 1) "7 search evolutions" seems like one idea rehashed into seven bullet points. A little redundant. 2) FTFA:

    "However, heavy users of search already understand that the average search takes 11 minutes and 50 percent of searches are abandoned."
    Um... no, I didn't know that. Where did you collect the data? How did you calculate that percentage and average?
  • Well, I have tried some keywords and Hakia has objectively less relevant sites when Google not.
    Don't make Adversing with Slashdot guys help when you have nothing new to offer.
  • by porneL (674499) on Saturday December 16 2006, @04:08PM (#17271332)
    (http://pornel.net/)
    There's a problem with über-smart semanting search engines - if they provide answer right away (by understanding semantics and/or choosing very relevant snippet), there will be little incentive for users to visit web sites that provided the information. This means that search engines will steal ad revenue from content providers and content providers will revolt agains such engines.

    This is already a problem to some extent - Nielsen wrote about this in 2k4 [useit.com].

  • Predictions are always difficult. Here are some comment from somebody
    working in the field:

    > 1. The first time a search engine will have an alternative to
    > indexing; new technology like QDEXing will be developed.

    Indexing is a pre-requisite for fast access of retrieval results.
    Even distributed peer-to-peer indices that are a very attractive
    idea suffer from bad performance due to the absence of a monolithic
    index owned by an organization with huge bandwith.

    > 2. The first time ontological semantics will be used that will
    > enable a search engine to perceive concepts beyond words and
    > retrieve results with meaningful equivalents.

    The problem with applying ontology based search is the disambiguation,
    i.e. the mapping from natural language words (terms) to the unambiguous
    nodes of the ontology (concepts). Automatic disambiguation needs to be
    pretty good in order to help in search, but this is unfortunately still
    an open research problem.

    > 3. The first time that search results will include highlighted best
    > sentences as a result of semantic analysis rather than bolded
    > keywords as a result of finding incidences.

    This prediction seems to mix presentational issues (bold layout) with
    processing issues. The problem with the former is that flagging a whole
    sentence bold perhaps isn't well liked, as it could already have been used
    with current technology. The problem with the latter is what exactly is
    meant by "semantic analysis" - "deep" automatic natural language processing is
    still a very costly operation and may not be an option as early as 2007 to
    be applied to a whole Web index.

    > 4. The first time that a single query will bring a gallery of
    > results equivalent to running multiple queries about the
    > meaningful variations of the same topic.

    We would not notice this, since it would be carried out internally.
    However, this processing intensive step could be (preferredly) replaced
    by result-equivalent change in the ranking algorithm.

    > 5. The first time a search engine will let users evaluate answers
    > on the spot by displaying uninterrupted and coherent text
    > snippets, often letting searchers forgo having to click through
    > to links and saving time.

    Giving answers is certainly an emerging trend, cf.
    http://www.infonortics.com/searchengines/sh05/slid es/leidner.pdf [infonortics.com]
    But it may last longer than one year to become pervasive.

    The repeated mention of snippets seems to suggest that the author of
    this set of predictions has found fault with snippets and considers
    this a priority, whereas most people - at least on desktop PCs - seem
    to be okay with the way results are summarized today.

    > 6. The first time a search engine will have a dialogue utility that
    > will help point out best answers or suggest a Gallery for a more
    > engaging human-like search experience.

    Further work in interactive search is certainly ongoing, in some sense
    a dialog feature is already operative, as real search engine logs show
    that users keep re-phrasing and refining their searches all the time
    to converge to the results they desire.

    > 7. The first time a search engine's data will grow by detection of
    > new knowledge rather than by detection of new pages. Search
    > engine growth by knowledge will be the new direction for the
    > industry for 2007.

    This depends on a universally accepted notion of knowledge, and how to
    measure/acquire it automatically. Perhaps one of the strengths of modern
    search engines is that NO commitment to any kind of theory of knowledge
    has to be made, it works - for better or worse - because all it needs
    are strings.
  • Dragging Results (Score:1)

    by Siker (851331) on Sunday December 17 2006, @03:32PM (#17279432)
    (http://www.norwinter.com/)
    I think human input will definitely come into play in the future of search. Ultimately you can make machines very good at recognizing spam content, but how can you possibly identify what people really want to see without asking them?

    The way forward is to allow people to reorder their results and to delete spam results. This way we'll have a search engine that actually learns what people want and acts appropriately. Sites like Digg and Reddit are on to something in this sense. They use 'swarm' technologies to determine what is most relevant in a certain narrow category at a certain time.

    Just like another commenter mentioned there is already something like this: Yoople [yoople.net]. A couple of months back I wrote that Google's Searshmash [playingwithwire.com] secretly was playing around with something like that too.
  • like "cat*" to get cat AND cats!
    [ Parent ]
  • The smallest unit of data indexed by most search engines is the "word" not the "character." Thus, it isn't possible to create regular expressions that work at the character level, although something approximating "regex" on strings of words has often been implemented and there is no reason why full regex couldn't be supported fairly easily -- as long as you accept that "words" are the base symbols in the regex, not characters. For instance, "phrase search" and proximity search operators like "near," "within," "before," "after," "same sentence," have been implemented in many search engines and provide some of the capabilities of regex -- with words as the base symbol.

    Many search engines implement "stemming" as a means of covering the most common requirement for sub-word query predicates. Stemming allows both "cat" and "cats" to be treated as the same word. A system that provides stemming will often use "stems" as the smallest unit of data indexed rather than words. Thus, in the index, there would only be one entry for "cat" which represents both the words "cat" and "cats."

    Of course, character-based search exists today for small corpora -- grep is the most well know system, I think. But character-based search on a large corpus is a very difficult problem that, using today's algorithms, would require massive computing capacity to provide to even a small number of users. However, in some specialized applications, this can be justified. One technique for reducing the resource cost of character-based regex on large collections is to index bigrams, trigrams, quads, etc. (i.e. use multi-character sequences as the base "symbol" or smallest unit indexed. ) However, such ngram based systems still don't deliver the serving capacity that one would expect from something like a Google or Yahoo!.

    One "mixed" strategy that has sometimes been employed is to support a more complex query that has a "first part" which is word-based (as in traditional search engines). This first-part is used to select a subset of all the documents in a collection. Once the candidate-result-set is produced, one would run a character-based regex (similar to grep) over the results. Of course, you can probably see that you can easily hack up such a two-phase search yourself by sending a word-based query to a search engine and then doing the character-based regex on the result pages.

    I fear we are stuck with word-based search for the foreseeable future. While folk often ask for character based regex, the reality is that it is simply too hard to implement today. Also, the size of the market that has a real need, as opposed to a desire, for this capability is much smaller than one might think....

    bob wyman
    [ Parent ]
  • 4 replies beneath your current threshold.