Web Searches For What Lies Beneath 80
fat_hot writes: "The New York Times has an article [here] (registration required) about specialized search engines which try to drill into the submerged mass of the Internet iceberg to try to limit searches to particular subjects (and hopefully thereby increase coverage of the limited scope)." Considering that a google search for friends' web sites and other good stuff usually turns up more dirt than paydirt, it's pleasant to contemplate more relevance in search engines.
Good relevancy results (Score:1)
Northern Light [nlsearch.com] strikes me as doing the best job of returning relevant results, going so far as to thoroughly categorize the results by topic. Also has a greater portion of the web indexed than any other engine. The downside is that there is a bit of lag time in adding new domains to the bot's indexing runs...
Google [google.com] is pretty good at giving relevant results, but it misses a lot of sites. AltaVista [altavista.com] is rather thorough, but not very good at relevancy ranking.
These observations are simply based upon my own experiences with these engines, so your mileage may vary. When performing intensive searches, I generally use all three, but I'll often start with Ask Jeeves [ask.com], which is easily the best meta-search engine out there...
The PDF problem (Score:1)
None of the major search engines (even Google) crawl
They may take you to the document (often that is an issue, note google url) but not crawl through the item.
Try throwing this or any other pdf url into google or any other search tool. http://www.census.gov/prod/ec97/97cfd2.pdf
Even the searchpdf.adobe.com engine only searches summaries and is not that large of a dbase.
This is not a technical issue. Most crawlers can handle
An interesting treatise on this very problem (Score:2)
Re:Off-Topic, for those of you who haven't heard. (Score:2)
Here [wired.com] is a story from Wired about it.
42... (Score:3)
However, I suspect that whatever the answer to the search engine problem actually turns out to be, it will have the following characteristics:
Google Bashing? (Score:1)
That's not to say that it couldn't be improved - I'd love to be able to "for sure" get exactly what I wanted in the top three or four returns, but, often I'm searching for something a bit obscure that is only being described by common words (alas, I can't think of what was vexing me in that department last week).
But, I think my point is still valid even if this super-search engine comes around: The search is only as good as the searcher allows it to be.
Re:Schools should teach Searching technique (Score:2)
Huh? Where'd I say that? Clearly your school isn't doing its part in teaching research & attribution.
Contrary I believe many schools *are* doing their part. No not all, but many. Tragically school libraries & school librarians have been tremendously short-changed in the past few decades, ironically often in order to fund sexy things like computer labs.
The truth is that the skills one needs to use in a library are even more critical now then they were in the past. As you correctly pointed out card-catalogues are dead, I can't think of any post-HS system that still seriously maintains one. Unfortunately the helpful Reference Librarian willing to walk a random person around and re-tech them the ropes have also been budget-cutted out of existence too. With the information explosion / the information economy the ability to search, prioritize, and compile material has become even more critical (not to mention the ability to comprehend the materials.)
Corporate knowledge-bases, electronic paperwork, web-based 'employee handbooks', online job searches & apartment rentals; these all require the ability to search for information in an efficient and comprehensive way. Search-engine cluelessness is simply a symptom of a wider problem.
That said again I believe schools are doing a reasonably good job. I know my old elementary & high schools are teaching kids how to use search engines, as is my old university library. My concern is for those out of the educational system.
Reading the directions doesn't seem onerous to me. If one is performing searches and coming up empty or with useless material then figuring out how to fine-tune one's searching doesn't seem to require any great intuitive leap. Yes it would be wonderful to live in a world as trivially comprehensible as the doorbell but lacking that most folks have learnt to READ THE DIRECTIONS.
Generally search engines do a great job of explaining how to use them. There are even search engines that try to out-think the user and parse their natural-language requests into regular search expressions. Google isn't one of these engines; it's a high-powered bare-to-the-metal engine that requires a certain amount of understanding by its users to use. On the other hand there are literally dozens of other engines that *do* walk a person through performing a decent search. The fact that folks pick the wrong tool for the job (a tool they neither know how to operate nor are willing to invest the 1-screen/2-minutes to learn) and then complain about their results seems to be just idiocy on the part of the user (or in this case an article author.)
Yes, the original article clearly set up a straw-man in order to promote these dedicated search engines, on the other hand there are legions of folks who continue to use search-engines every day with poor results and do complain about them.
The solution? I dunno - sell them more lottery tickets?
Re:blame the luser again? (Score:2)
That point aside I'm trying to figure out the rest of your posting. You don't like the fact that different search engines use different formats? Well pick one and just use it. You prefer a GUI interface instead of a command-line type one? There are lots of those. You'd prefer a walk-through format? There's lots of those too.
I think you've got a point somewhere but I can't find it. I suppose my only comment would be that folks should, again, pick tools suited for the job. If it's not worth it to them to learn a seach syntax then they shouldn't use a search-engine that relies on one (DUH!) Google requires a syntax, many others don't, use one of them.
As to search-engines getting tricked into returning misleading its, yeah that's a problem but not a big one. So 5% or even 10% of the hits are come-ons to porn sites, there's still going to be ~30% good hits (the rest misses of varying degrees) and that's enough to be productive with.
Finally - don't tell someone not to be "smug and negative", I could insert some comments here about the apparent tone of your posting but that wouldn't be productive, lets just say I don't see those in my posting & drop it.
Searching Technique (Score:5)
Most of us recall being brough into the school library and show how to use the card catalog, given a few assignements, etc. Unfortubately for those of us out of school the's not that set of skills in place to help searching.
Boolean seaches, using key words, supplying partial words, phrases, etc. are all supported by most search engines but few folks understand how to use them.
What's really suprising to me is that folks who use search engines regularly, indeed even rely upon them (journalists I mean you!) seem some of the most poorly prepared. There are lots of resources for learning how to do a good search, many from the search engines themselves and many more from third parties yet we still get these perennial "I can't find ..." stories.
Honestly, I'm not into blaming-the-victim but how difficult is it to learn how to perform a good search? One screen of directions? Two minutes of time?
Yes there's a place for specialized engines handling unique or limited content but most of the larger, more general purpose engines do nearly as well if properly used. Again, it's dependant on the user to learn how to define what they want, all of the tools in the world are no good if they're not taken advantage of.
What Google and Yahoo! are missing..... (Score:2)
That's where specialty search engines like Moreover [moreover.com] come in. Eventually, sites like this will let you search those bits of the Web that change often (news sources, weblogs, discussion groups, sites like Slashdot, message boards, financial news, etc.), allowing people to keep up with things as they happen.
Existing search engines are great at finding things that are archived on the Web, but poor at keeping up with what's currently happening. Looking for all the articles on the latest Shuttle mission, as well as what people are saying about it? You might find one or two things about it on Yahoo! or Google, but a search engine like Moreover will find the fluff article on CNN, the more in-depth article on Space.com, and a discussion about the mission on Slashdot. That's pretty powerful.
Re:Directories and specific searches plethorous (Score:1)
By the way, the NYTimes story mentions moreover.com, which is a great service. But since their search feature only searches headlines, allow me to mention my own project, NewsBlip.com, which performs full-text searches. Give it a try, thanks!
There's a way (Score:2)
Or a review system. There is a way to do it, although it might be a pain in the ass. Basically, what you need is a web of trust and digital signatures.
For example, suppose I have a list of keywords. I submit my page to a reviewer, and they judge whether or not my keywords are a reasonably good match for my page. If I pass the test, they PGP-sign my page.
Then you just have a modified search engine that only returns pages that have a valid signature by someone who is on a list of authorities that the searcher trusts.
This type of thing could be used for a more general web page rating or reviewing system. It's just that perhaps some reviewers might judge pages solely on the criterion of meta tags matching the content.
---
Re:Off-Topic, for those of you who haven't heard. (Score:1)
http://www.google.com/search?q=cache:www.georgewb
This is Google's cache of http://www.georgewbushstore.com/.
Google's cache is the snapshot that we took of the page as we crawled the web.
The page may have changed since that time. Click here for the current page without highlighting.
Google is not affiliated with the authors of this page nor responsible for its content.
These terms only appear in links pointing to this page: dumb motherf******
Very obviously Google uses words from OTHER websites to link back to websites in searches. I'm not sure I like this. This looks as though by someone linking to my website and putting bad words in their website, I could be affected by it.. anyone able to comment on this?
Re:Off-Topic, for those of you who haven't heard. (Score:1)
For instance, if you do a search for pornography on google, you can often times get a link to Disney.com. The reason for this is because many porn sites, if you click the, I AM NOT OVER 18 link, take you to www.disney.com.
This is both something good and something bad in the way that google indexes.
I have to agree with Arkaein here, it is very odd that someone was able to fool Google into thinking that the GW store was a top linked site. It would be nice if Google where to show you were the reference came from
Searching.... (Score:2)
I think part of the problem lies in the fact that they match words all over the website.. ie... if I type in "hot green hamsters" the words Hot, Green, and Hamsters can appears anywhere on the homepage, even if I put them in "'s the search programs dont' always group them togethor. So A page talking about hot peppers, green peppers, and how hamsters eat the pepper gardens in Mexico, would bring up a search, even though it wasn't anything about what I was looking for.
Re:Searching Technique (Score:1)
Google doesn't take into account... (Score:2)
IIRC, Google uses an algorithm that, based on a combination of HTML tag size and logged click-throughs would sort the links. Neato-keen.
Well, about a year ago when google was still young and fresh, you could type in your search strings, hit the "I'm Feeling Lucky" and get EXACTLY what you wanted. Blew me away time and time with its strange accuracy.
But, as more of that click-thru data got integrated into the sifting, I got more and more of the crap that the sheep (ie, normal mom and pop AOLer types) wanted to look up. What the hell, man. Don't get me wrong, I still use google, but now I have to scan three pages deep before relevant pages come up.
Dirk
How to fool Google (Score:1)
7) The number of times the linking page is linked.
This is not hard to manipulate. Put lots of links [slashdot.org] in your pages instead of "keywords".
Once again the porn industry leads the way on the web
instead of pages that look like this
sex sex sex sex sex
sex sex sex sex sex
sex sex sex sex sex
sex sex sex sex sex
they now look like this
sex [slashdot.org] sex [slashdot.org] sex [slashdot.org] sex [slashdot.org] sex [slashdot.org]
sex [slashdot.org] sex [slashdot.org] sex [slashdot.org] sex [slashdot.org] sex [slashdot.org]
sex [slashdot.org] sex [slashdot.org] sex [slashdot.org] sex [slashdot.org] sex [slashdot.org]
sex [slashdot.org] sex [slashdot.org] sex [slashdot.org] sex [slashdot.org] sex [slashdot.org]
Also I think google looks at the url to see if it matches the search word this is not difficult to manipulate either
www.foo.com/sex.html has 20 links to www.foo1.com/sex.html etc etc
Re:Searching Technique (Score:1)
Re:Directories and specific searches plethorous (Score:1)
Will we end up there?
Re:Searching Technique (Score:4)
Re:Stupid premise (Score:1)
Almost a year ago (in the beggining of april, to be more exact) they announced this breakthrough technology. I'm not sure why it isnt in use now...
Dont you love april fools?
Re:Stupid premise (Score:1)
True, Google can't read minds, but if you type "Can Google read minds?" and click the I'm feeling lucky button you'll find out about a nifty feature that's almost as good as being able to read your mind.
Re:42... (Score:1)
> moderatable by users, to eliminate pages
> designed to beat the system and improve the
> ranking of pages that are useful.
I like 'em all but that last one. I don't want to search for linux and get a bunch of rootkit sites modded up to (Score: +5, l33t).
In all honesty though. You'd have to put so much abuse-protection into place in a moderation system for search engines that it would blow your mind. I don't want the web rendered useless by 5cr1p7 k1dd13z who cracked the mod system. Oh well, people suck.
Justin Dubs
Off-Topic, for those of you who haven't heard. (Score:2)
Punch in "Dumb Motherfucker".
Click "I'm feeling lucky".
Its Called Niche Searching (Score:2)
And its been around for a while as a concept. I used to work for SpaceRef [spaceref.com] who maintain an excellent niche search engine devoted to space exploration.
I maintain Omphalos [omphalos.net] which is a niche search engine devoted to the modern alternative religions (Paganism, Wicca, etc) and related subjects.
All it really requires is a reliable collection of websites focusing on a specific range of subjects and good search engine software to index their pages. The results are often much more relevant than those from the major search engines - although Google is generally an excellent choice IMHO.
Sounds like Natural Language Processing... (Score:1)
Unlike method xxx, our method yyy does something completely different, unrelated and totally offtopic.
Although you could envision ways of sorting through this example, realworld examples can be far more abstract and disjoined.
-Moondog
Stupid premise (Score:2)
What did they expect? Google can't read minds yet.
Bunch of mojacks.
Tony
I searched (Score:1)
blame the luser again? (Score:2)
Yes you are blaming the victim. The basic concepts of searching take less time to learn than fancy terms like "boolean". Ideals are nice, but the devil is in the details. Search engine sites perform a difficult task and some do a first rate job. For that they should be thanked, but nothing is perfect.
What confounds the user mostly are all the syntaxes uses to express those concepts. They are different for every site and take some getting used to. It would be neat to see a search engine with more than one line for input. You could have a box for exact phrases, one for anyword matches, an exclusion box... It's not that command line syntax is ugly, it's that most people have better things to memorize.
Another thing that confronts the user is the effeciency of the search itself. Very clever people constantly seek to fool search engines, and ocasionaly do. The result is garbage to wade through until the search engine can recover. I remember a time when all search.com would retrieve was porn sites. Even Google has been beat a few times.
Let's not be so smug and negative. Look for the opertunities presented by user confusion. Be happy that these new search engines are comming.
Re:Searching.... (Score:1)
Some search engines do this if you enter the phrase in quotation marks, too.
Re:Sounds like Natural Language Processing... (Score:1)
Re: Wrong! (Score:1)
No, the problem is that Google (and every other major search engine) takes forever (weeks to months) to spider new pages. So just after Chaves was nominated, none of the news articles about her had been indexed. By the time the spiders hit them, she'd already been dumped.
The basic problem is that HTML spidering is a horribly inefficient way of indexing information that is often (especially in the case of news articles) stored in a nice, neat database.
Re: Wrong! (Score:1)
Re:Its Called Niche Searching (Score:1)
Examples:
http://www.Allmusic.Com
Nuclear Explosions Dbase
http://www.ausseis.gov.au/information/structure
Finally, it could also be asked that even if this material was crawled would the lack of an interface and search capability tailored to that data (specific sorts, etc) make pulling that material out of massive dbase (Google, AV, Excite, etc.) effective.
oh come on (Score:3)
A Google search for 'dumb motherfucker' will yield George W. Bush's website, how inaccurate could Google possibly be?
"a Google search on "chavez" led to several encyclopedia entries on Cesar Chavez" Would it have fucking killed them to type in "Linda Chavez labor secretary"? And this was very recent news, exactly how quickly do you expect Google to scan the entire internet for updates? How quickly could these 'iceberg drilling' search engines possibly scan the net? It's a deep web right now, what's invisible will bubble to the surface if it's relevant... Maybe they have a point on using the search engines to only scan specific areas, but I think websites which specialize in these areas should license the Google engine instead of Excite's... (you know what I'm talking about right? Every big site has some article you want to find, you go to look for it, you get the worst search interface possible that doesn't return any useful links...)
--
Peace,
Lord Omlette
ICQ# 77863057
Not bloody likely (Score:3)
Yeah, and it would be great if nobody stole money and gave to charity, too. It just isn't going to happen. Any system that is A) valuable and B) depends on everyone behaving honestly is doomed to failure. You're never going to get people to stop cheating the search engines as long as doing so is both possible and beneficial to the cheaters. The plain fact is that manipulating the system works, and people are going to keep doing it as long as it keeps working. The only solution is to develop a system that is not easily manipulated.
Perhaps you should try looking at Google [google.com], a search engine that actually uses these in a clever way as the key part of its ranking system. It's remarkably effective at finding relevant information and at avoiding the kinds of simple manipulation you complain about. Other ranking schemes (like GoTo.com [goto.com]'s straight pay for placement system) are also relatively resistant to manipulation. I think that the long term solution is going to be natural selection; search engines that are easy to manipulate to give lousy results will go out of business and leave behind the ones that are actually useful.
Good luck. The latest versions of Google include over 1 billion pages. Manual sifting for poorly labeled ones just plain isn't an option if your primary goal is comprehensiveness.
Too difficult...Search Engines not clever enough.. (Score:2)
Directories and specific searches plethorous (Score:4)
Why does one need cheesy [financialfind.com] dotcoms [fuckedcompany.com] to tell us what a directory [yahoo.com] is?
A directory search [google.com] limited to U.S. newspapers immediately brings up, say, an explanation [washtimes.com] by Linda Chavez about her relationship with the illegal alien in question.
If one wants political news, one can go to a political news source [yahoo.com]. If one wants information on Linda Chavez, one can do a more specific search [google.com]. If one wants political news about Linda Chavez, one can (this must be getting very complex for your average dotcom founder) search a news archive [yahoo.com].
WANTED: Information (Score:1)
________
Re:Search Engines: misuse? (Score:1)
There are spaces in the middle of the two last. Delete them and they will work.
Search Engines: misuse? (Score:2)
In the example given by the article a "linda chavez" or "linda chavez labor secretary" query would be much better than the ordinary "linda".
Moreover, there exists the problem of determining the category of what is being searched. A trend is the use of AI and ontologies by the search engines, which determine what is really relevant in a page and classify it during the indexing phase based on the different categories (economy, medicine, technology, entertainment,
What the article talks about are the knowledge based agents. A quite interesting article can be found at: http://www.cs.technion.ac.il/~cs236512/www-search
Another interesting link:
- CMU World Wide Knowledge Base (Web->KB) project:
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/th
channel.nytimes.com etc... (Score:1)
http://channel.nytimes.com/2001/01/25/technology/
That'll get you in without registration.
Re:Off-Topic, for those of you who haven't heard. (Score:1)
http://www.google.com/search?sourceid=navc lient&q= link:http://www.georgewbushstore.com/
DUH!!! (Score:1)
Seriously folks. The article is just saying use the right tool for the right job. It's a no-brainer. If you want news stories you search cnn.com or another newsite. If you are looking for financial information search The Wall Street Journal [wsj.com] or another financial page. Search engines like google (get the toolbar, it is great) or AJ are for general searches to get you started out on a topic so you can refine your search from there. Duh.
What we all want in a search engine... (Score:3)
Examples:
Searching for "John Smith" should return my friend John Smith and no one else.
Searching for "C++ implementation of Knuth algorithms" should return exactly that, and leave out references to C++, Knuth, or algorithms.
At the very least, large search results should immediately separate the mass of results into categories - i.e. "Jessica Alba" - up at the top should be pr0n - fan sites - commercial sites - etc. Yahoo does this, but there are way too many categories. Really, the web has maybe 10-12 different broad types of sites - commercial, homepages, academic sites, pr0n, multimedia, weblog - you get the point, the list isn't that long. We should be able to filter entire broad categories out of our searches. Altavista does a fairly good job with multimedia searches - unfortunately there still is way too much manual searching - it still doesn't read our minds enough within the broad category search.
Google uses PageRank to determine the order of results, but does it track the sites its users click on after performing a search? No, but it should. Further, it should track users individually and be able to customize its results based on that persons individual personality. The more you use a search engine, the better it should work for you.
I can't stress this enough: A search engine needs to be able to read our minds.
Re:Off-Topic, for those of you who haven't heard. (Score:1)
Re:oh come on (Score:1)
Searched the web for Linda Chavez labor secretary. Results 1 - 10 of about 1,390. Search took 0.08 seconds
--ricardo
Re:Off-Topic, for those of you who haven't heard. (Score:3)
(Note: If you have arrived at this site through inappropriate references via a search engine, please be assured that we did not utilize this language in our site, our HTML, nor in our internet promotion of this site. What happened was the result of a malicious act and we are pursuing remedies through the efforts of our staff and attorneys.)
I hope I am not liable in Spain for using those words. Please don't tell them where Spain is.
--ricardo
Re:Off-Topic, for those of you who haven't heard. (Score:1)
Re:For Better Search Results (Score:1)
Re:Just Ask Jeeves! (Score:1)
Just Ask Jeeves! (Score:3)
Thanks for nothing you bastard butler!
Disagree (Score:2)
I disagree. I continually find close matches using Google, much better than anything I used previously (Hotbot was good for a while).
When Yahoo started using them I rejoiced. It was the best of all possible worlds (good search engine, web of content like the calender, and hand-picked sites when all else failed).
90% of Everything (Score:2)
90% of everything is junk
In truth, it maybe more than that.
So we come to the needle in the hay stack,and how the databases that the search engines consult give priority to different terms, how they index the various sites, and how long it takes.
Of course, for the person truly expert in these things, these are trivial details. They are as obvious as a traffic jam. For the rest of us, it is more a matter of "where did all these cars come from?"
Unlike our computer, there is no central index for the full content of the web. It is a job that is done continously at a surface level, and takes a month or two or three.
In that context, of course last night's news will not get indexed while we wait.
Just like the tradition of game installation, search engines have been designed to be used by people who have a clue.
Sometimes I swear that until we get a system designed by geniuses to be used by idiots, we will need to have some sort of internet user license or something. Other wise it is simply a matter of designing systems that can obey the command:
"Do what I want, not what I say."
This is an interesting problem in programming, is it not?
Re:Off-Topic, for those of you who haven't heard. (Score:1)
Re:What we all want in a search engine... (Score:2)
Re:The problem isn't the searches (Score:1)
Username and Password for NYTimes (Score:1)
Password: cyph3rpunk
Enjoy the article.
Search poisoning and web-based pr0n/warez spam (Score:1)
The concept isn't new, it's just the sheer volume that made Google freak out. The reason behind it is that Google counts the number of links leading to one page as an indicator of that page's actual popularity. So the spammers simply created hundreds, thousands of dummy pages with single, prominently-placed links which fooled Google's crawler.
The temporary solution, as always, will be to come up with a new crawling method that can filter out these poison pages, but of course it will only be a matter of time before someone "cracks" the new crawler. History repeating.
Poor Example (Score:2)
I love google but they are mean in one respect: (Score:1)
Upon an unsuccessful search they do not offer you the choice.
Obviously, they have no responsibility to offer it, but it's kind of slimy that the time you want the option the most, it's not offered.
Also exact string searches are a little weird, particularly if you forget the +s for common words like "the."
Believe me (Score:1)
Re:Searching.... (Score:1)
This is
People who regard this as useful information should not under any circumstances be reading
Ian
Natural Language Processing... (Score:1)
I know there are companies out there that has the technology to "put it all in one" so to speak. I have worked a little with Autonomy [autonomy.com], and I gotta say, I am deeply impressed by what it does. They employ technology called Bayesian Inference (from Thomas Bayes). The technology has to do with "calculating the probabilistic relationship between multiple variables and determining the extent to which one variable impacts on another" - Sounds wild, eh? Well it it. Together with this, their core engine, called DRE (Dynamic Reasoning Engine), relies on the theory of Claude Shannon, which states that "the less frequently a unit of communication (for example a word or phrase) occurs, the more information it conveys".
The more input you give it, the more accurate it will be. Oh, and it's actually for all kinds of unstructured information - also e-mail.
I ramble. You should check it out.
Autonomy also makes Kenjin [kenjin.com], which is a piece of software that you install that will understand what you are looking at, and help you search for similar stuff. Kinda kool.
Yes blame the luser (and public education) (Score:1)
E.G. - 3 people I work with were trying to find the name of the Abbott & Costello movie with the voodoo doll making witch in it (don't ask), they were searching and searching for made up names, years, actors names, ad nauseam, when 'Abbott Costello witch voodoo' (click I'm feeling lucky) brought it right up.
The trick is visualizing and then boiling down your desired target text to specific unique words and then searching for those words. Sounds obvious but most people still expect technology to have animate, responsive, understanding qualities like it does in the movies.
Corruption in, Corruption out. (Score:1)
I have suggested a "fix" for those who give a crud.SEE This [ezboard.com]
Schools should teach Searching technique (Score:1)
I personally am self taught in the 'art' of searching the net. this includes using boolean operators and as previously mentioned, the use of quotes around phrases. Why can't schools teach these usefull skills?
Re:Not bloody likely (Score:1)
"Good luck. The latest versions of Google include over 1 billion pages. Manual sifting for poorly labeled ones just plain isn't an option if your primary goal is comprehensiveness"
Well, the idea wouldn't be to look at ALL the pages, but rather, the main sites themselves. Porn is easy enough to find. By looking at the frontpage of sites, you'll be in better shape, even if you are just removing several million cheater domains.
Heck, you could hire high school students to do determine if a site is cheating or not. Not that difficult.
Re:Stupid premise (Score:1)
The problem isn't the searches (Score:3)
Why doesn't everyone use metatags properly? What about specifying good (descriptive) title tags?
Plus, don't you think it would be much easier if people actually didn't try to cheat search engines?
In actuallity there would be some very easy ways to score pages for relevance then:
1) The number of times a particular word shows up in the keywords, and description of the page.
2) If the word actually appeared in the title of the page.
3) The number of times the word appears in the body of the text
4) The length of the supposedly searched word
5) The number of times a particular page is linked to.
6) The words used to in the link
7) The number of times the linking page is linked.
Wouldn't the world be happier. Personally, I think that it would be great that if there was an editing team that would simply delete misrepresented pages.
Anyway. That's my two cents.
Google works -- and web evolution (Score:1)
I disagree with the line about searching for stuff on google turning up dirt. If you know how to format a search properly, and which words are key, nearly anything can be found on google.
OTOH, it is always nice to search technology getting better. There are some simple ideas which would aid searching, such as voluntary self classification of web sites into general categories (I'm sure this could easily be worked into one or of the emerging document stardards, if it hasn't been already). This would effectively divide the internet into a large number of overlapping sub-nets, as far as searching was concerned -- you could search everything, or just websites pertaining to 'games', etc... I think that a solution along these lines (although probably a better/more complex version) will be necessary before truly powerful searching becomes easily available.
I can't envision some complex algorithm and/or a team of people classifying stuff ever being a strong solution without the aid of enhanced standards for the web.
-Robert Thornburg
Re:Off-Topic, for those of you who haven't heard. (Score:1)
It is pretty strange though that anyone was able to fool google into making it the top site returned for the query, though. Google gives links from highly visited sites more relevance, so some little two-bit web site won't have a great deal of influence.
DON'T WASTE MY TIME WITH SUBSCRIPTION NY Times (Score:1)
Search AI (Score:1)
If google worked that way... (Score:1)
It may be true that as google indexes more newbie home pages, the average quality of the links it sees is going down, but that's another issue.
Integrate Meta data with domain registration? (Score:2)
Re:Searching Technique (Score:1)
Defining search criteria in such a way as to guarantee that your desired target shows up among the first page of hits is a bit like trying to find a jet that will take you to a specific street address.
I usually search on a phrase that is likely to take me into the approximate quadrant of the haystack containing the needle, then take a scenic ramble through the neighborhood until I link my way to my destination. If the destination is not well linked-to, you won't likely find it with a search engine anyway.
My apologies for the gruesome mixed metaphors!
Re:Google doesn't take into account... (Score:1)
In any case, the point is that I specifically asked the guy (he was actually one of their engineers, responsible for Google's SafeSearch, IIRC) if they did any click-through analysis to try and improve the relevance of their results. He responded with an emphatic "no". They believe it's just too much of a privacy concern.
Re:Just Ask Jeeves! (Score:1)