Google's Bigger Index 412
WebGangsta writes "Google Inc. today announced it expanded the breadth of its web index to more than 6 billion items. This innovation represents a milestone for Internet users, enabling quick and easy access to the world's largest collection of online information."
Here's hoping (Score:5, Interesting)
Re:Here's hoping (Score:5, Interesting)
First, the reindex that happened a few months ago removed all cross-reference with accents.
(where google would find the same number of links for both the word and the unaccentuated word... right now: soupcon: 9,750 - soupcon: 88,500)
Then, when searching for anything regarding ras error messages, I get 30 links from spammer and then the real stuff.
Example: 711 error yields multiple links for similar pages...
"Your one stop resource for all things error 711 remote access connection
management related.
Vintage Google.. in Net years, that's 15-16 months ago, right?
Re:Here's hoping (Score:4, Interesting)
Google search: 711 error [google.com]
Come on, Google. Stop reading slashdot and fix the problems.
Re:Here's hoping (Score:3, Informative)
It could be much smaller ;-) (Score:5, Funny)
Re:It could be much smaller ;-) (Score:5, Funny)
And if they'd just stop indexing blogs, the entire Internet would fit onto a CD.
Re:It could be much smaller ;-) (Score:5, Funny)
Re:It could be much smaller ;-) (Score:4, Funny)
Re:It could be much smaller ;-) (Score:5, Funny)
You could fit the blogs on a CD as well. Just store a template blog and include a program to generate random variations, e.g. "my dog has fluffy fur today" vs "my cat has fluffy fur today".
Technically, this would be "lossy compression" (since some data is deprecated but no one will notice the difference). Though on the other hand, it could even be argued that removing blogs entirely would be a form of "lossless compression".
Re:It could be much smaller ;-) (Score:5, Funny)
Work has already been done on this. Have you seen the Markov blogger [perl.org] on use Perl? Soon all bloggers will be replaced with a Perl script.
Re:Already been done, sort of (Score:3, Funny)
Re:It could be much smaller ;-) (Score:5, Funny)
Re:It could be much smaller ;-) (Score:4, Funny)
how many? (Score:4, Interesting)
Re:how many? (Score:5, Funny)
Re:how many? (Score:4, Funny)
Re:how many? (Score:5, Informative)
Maybe if more people used Google's Search Quality feedback form [google.com], it would help weed them out.
Better way to tell Google of bad results (Score:3, Insightful)
At the bottom of the page, under the second search box, is a phrase "Dissatisfied with your search results? Help us improve. [google.com]" - Follow it and the form will ask you to:
Mailing lists (Score:5, Interesting)
Heh (Score:5, Interesting)
Re:Heh (Score:5, Funny)
Re:Heh (Score:5, Insightful)
I'd find it funnier if every man woman and child on earth at least had unrestricted access to Google and everything it links to.
Re:Heh (Score:5, Funny)
With my luck, I bet my one item is a page with prescription drugs and weightloss suplements at bargain prices.
I hope your item is better.
Re:Heh (Score:5, Funny)
Re:Heh (Score:3, Funny)
Re:Heh (Score:5, Insightful)
In other words, no, can't say that I do.
Not only is it an entirely artificial milestone devoid of meaning even in the sense of interesting coincidence, it's an artificially created "milestone" for the purpose of pointing it out.
Any marketing department can churn out such by the barrel full.
KFG
Re:Heh (Score:5, Funny)
Re:Heh (Score:5, Interesting)
Re:Heh (Score:3, Interesting)
If a few hundred million people can generate more than 6 billion pages, just imagine what number all of humanity can produce?
Most press-release like post ever (Score:5, Insightful)
Re:Most press-release like post ever (Score:5, Insightful)
* "...to 6bn" : From what number before?
And I still can't find what I'm looking for! (pun definitely not intended)
Re:Most press-release like post ever (Score:3, Interesting)
I would have liked to see some information about the underlying technology that allowed this bigger index, especially if it allowed the broader coverage without a reduction in search result quality.
Google thumping its chest? (Score:5, Interesting)
Is Google setting up for its IPO and therefore becoming less like the Google we know and love?
The real question (Score:3, Interesting)
Google, over 6 billion served. (Score:5, Funny)
but... (Score:5, Funny)
Milestone (Score:3, Funny)
One for every man, woman, and child. Sounds exactly like the thinking of a machine to me.
Related? (Score:5, Funny)
Little boys across the globe will have sore arms tommorrow.
It's only a matter of time.. (Score:5, Interesting)
Re:It's only a matter of time.. (Score:5, Insightful)
As far as I know, image search in the way you want it is still only a dream. But. Approx 2 years ago I attended a conference focused (mainly) on theoretical computer science. I saw some researchers (I think they were from Italy, not sure) present an early implementation of their algorithm to look for similar images to the one you select.
The idea behind: For a computer, it's not easy to tell what exactly does an image contain. E.g. take all those "type the word you see above inside this box to prove you are not a bot" registration forms. If there are no working algorithms to tell "this image contains the word SLASHDOT written in yellow and blue stripes on a pink-dotted black background", the chances of creating an algorithm to tell "this is a game of tennis, it is probably played in the afternoon somewhere in England" are really low.
However, by using various approaches from CG (comp. graphics), you MAY be able to tell whether two images are similar or not -- as simple examples consider edge detection, color spectrum, etc. As I already mentioned, such algorithms have already been implemented and their success ratio is already reasonably high. I expect that it won't take long until we see them on google.
Note that using the ideas above you CAN search for an image with a given subject -- it just requires two stages. Suppose you want an image of a sun setting down somewhere in the mountains. Stage 1. You enter "sunset" into google's present search engine. You get lots of sunsets, several dogs named Sunset, a chinese girl Sun Set, etc. Then you select one of the sunsets most resembling the image you want and you tell google (or some other engine) to find all similar images. Et voila.
A company spokesman added... (Score:5, Funny)
4.28 billion web pages... (Score:3, Interesting)
Does it sound to anybody else like the rumours of Google hitting a deadend in the number of index position for the websearch are true? Especially given that it has been more than a year since they announced 4 billion.
Apparently pagerank assigns an unsigned int to every page as id, and their index is so huge they cannot convert it to a 64 bit number. (You wonder why they didn't think of that 2-billion pages ago when a UTF8 like solution would still have been possible).
Re:4.28 billion web pages... (Score:5, Funny)
PHB: We've run out of accounting codes! We can't do anything without one!
Dilbert: Why not upgrade the system to accept larger codes?
PHB: To do that we'd need a budget and an accounting code
Dilbert: Why can't we reuse a code from an old finished project?
PHB: Strangely enough, we've never finished a project.
oh, come on (Score:3, Insightful)
What I want to know... (Score:5, Interesting)
Re:What I want to know... (Score:5, Informative)
Even better way to report (Score:5, Informative)
http://www.google.com/contact/spamreport.html [google.com]
This will give you options of reporting cloaked pages, doorway pages, deceptive redirects, misleading or repeated words, hidden text, etc. You have to be more specific than the "help us improve" link at the bottom of search results. Using this form I've seen abusive sites disappear from Google's index in less than 12 hours.
Re:What I want to know... (Score:4, Informative)
+1
There are things that you just can't use Google for any more becaues these googlespam sites score so well... it's like being back in the days before google... [archive.org]
Re:What I want to know... (Score:5, Insightful)
"...represents a milestone..." (Score:5, Insightful)
It's expected as the web grows, so will the search engines.
This isn't exactly a man-on-the-moon accomplishment.
Re:"...represents a milestone..." (Score:3, Insightful)
6 billion items is just that, a milestone.
is it just me? (Score:5, Interesting)
Their search has apparently improved as well ! (Score:5, Informative)
That seems to have changed!
I just tried a search on television antennas [google.com] and for once the results seem relevent.
Hooray!! Google is back!!
Sunny Dubey
Faked URLs (Score:3, Interesting)
</curious>
Still nok (Score:5, Interesting)
I however find my post while googling for words they also contain.
How can one explicitely forbid Google from indexing a site ?
Sorry, I'll keep using Altavista [av.com].
Re:Still nok (Score:3, Informative)
Re:Still nok (Score:3, Informative)
It can take a very long time for a site to be spidered after it is submitted via the "add a url" form.
No Good... (Score:5, Interesting)
Re:No Good... (Score:5, Informative)
Google alternatives: Gigablast (Score:3, Informative)
It's still smaller than most other search engines, but it's quite fast, has good relevance and it indexes stuff in real time.
Besides, if you don't find what you are looking, you can do the same search with 5 other search engines just by clicking on links at the bottom of the results page.
But what I like with Gigablast is that it's always getting better and I feel like part of something that has potential.
They said 6 billion items, not webpages. (Score:5, Informative)
To find the rest, we need to use Google's other services. The image search is claiming "Searching 880,000,000 images". Google Groups says its "Searching 845,000,000 messages". Add those to the count and you get 6,010,199,744 items total.
Re:They said 6 billion items, not webpages. (Score:3, Interesting)
Sort out their indexing problems first (Score:5, Interesting)
Many people said that Google were using deliberate tactics to encourage small e-commerce websites to spend more on adwords, but I believe this wasn't deliberate - their index is so big that they simply can't tell what the results of their changes are going to do to the search orders for all the search options that people are going to use - and they simply didn't realise in advance the problems they were going to cause. And google have made efforts to minimise the damage since then, but they still need to do more.
Jolyon
Since when did bigger == innovation? (Score:5, Insightful)
Thanks (Score:5, Funny)
Run out of indexing space? (Score:5, Interesting)
Re:Run out of indexing space? (Score:4, Interesting)
So I would guess that they already use more than 32 bits per item with everything in a single item ID space, or they use 32-bits plus some code indicating the ID-space, or more perhaps a variable length code depending on the item type, e.g. like UTF8. In any case, they should have exceeded 32-bits long ago.
Re:Run out of indexing space? (Score:3, Informative)
That is suspiciously close (Score:3, Informative)
Good for Google...but: (Score:5, Interesting)
For example, if one searches for "TCP/IP tutorials", it would return many unrelated links like posts in newsgroups, college lectures, etc.
Re:Good for Google...but: (Score:3, Informative)
I saw some research recently at a conference that used complex vocabulary matching algorithms to automatically extract topics and organise large numbers of documents into topic hierachies and present summary reports, but I think that might be a bit too processor intensive and cutting edge, even for google.
Google Print (Score:5, Informative)
I was interested that they mentioned Google Print [google.com], which is Google's answer to Amazon's Search Inside [amazon.com] feature, but hasn't got much press, and is pretty well hidden in Google itself.
You can check it out by limiting results to site print.google.com, e.g. searchterm site:print.google.com [google.ie]. (Not quite at Amazon-type numbers yet.)
Caveat Emptor (Score:5, Insightful)
Happy Trails!
Erick
How much space do they use for caching? (Score:5, Interesting)
The hard disk and RAID folks must LOVE Google....
Re:How much space do they use for caching? (Score:5, Interesting)
Not!
Hell, even doing 2x or 3x this amount for server-class drives still leaves us talking lame amounts. Just one Hitachi/Sun 9980 Fiber Channel drive costs several times more than this.
Seriously, everything I've heard indicates that google's methods hinge on a lot of white boxes, each one covering a subset of the google data. Put another way, drivespace per server isn't the limiting factor. A distributed system with several hundred white box servers can't HELP but have tens of terabytes of storage, given drive capacities of tens and hundreds of gigs each.
A client just bought a Hitachi 9980. As sweet as the Hitachi arrays are, I thought it was the most horrendous waste of cash I'd ever seen, considering this client's more modest needs. THOSE are the customers that raid/drive makers love... all it takes is one IT guy with hardware lust who has the trust of a Fortune-500 firm.
Re:How much space do they use for caching? (Score:3, Informative)
The trick, is how to back it all up in shortening backup windows. Things like truecopy work, but take twice the disk space.
SPEED is the answer (Score:3, Insightful)
Whee, it's a press release (Score:3, Insightful)
A press release complete with corporate speak!
"This innovation represents a milestone for Internet users, enabling quick and easy access to the world's largest collection of online information.".
This is just google doing what they are already well known for doing best. There's nothing new or 'innovative' here. While it's a fine accomplishment, and I'm please google has indexed that much stuff, it's hardly innovative for them.
Is /. pro Google? (Score:5, Informative)
That's a quote from the NYtimes [nytimes.com] (free req. yada yada) also posted as is here [cfpm.org]
If any other site were to track the stuff Google does,
Please note, this isn't a troll, and I'm not wearing a tin-foil hat (maybe I should?). Imagine the following scenario: a bomb goes off in the US. By tracing searches for "anarchist cookbook" to zipcodes within the area of the bomb blast, the FBI could have access to information that makes TIA look like a better alternative.
Maybe this isn't such a good feature after all...
Re:Is /. pro Google? (Score:3, Interesting)
Google pulled us out of "The Dark Ages" (Score:5, Interesting)
There is an interesting article in Wash Post Search For Tomorrow [washingtonpost.com] on Google, and possible AI in search.
Some excerpts:
I really don't agree with that article (Score:4, Insightful)
I encourage all of you who are in high school or have college papers to write to look beyond Google the next time you have to research something. You will find about fifty times as much information by looking in published volumes. Here's the technique I always use: visit a University library. Use the electronic card catalog to find a couple of titles that seem to match your topic. They will likely all have similar call numbers. Then, go browse the stacks around those call numbers. That will give you access to all the books available that are related to your topic, and on the next shelf over, are books that are tangentially related. Every time I do that, I find some fascinating angle on the subject matter I never even knew existed. The books you find will have references, and you can follow those to immense amounts of material more specifically related to the angle you've chosen. And none of it is on Google.
If you have trouble, go ask one of the friendly research librarians. They do a lot more than go around and "shhh!" you.
Google is a useful tool, but if you want real depth, from people who aren't tech savvy enough to put their full academic works online, the library is the only place to find it. Put in the time!
It's worth mentioning... (Score:5, Informative)
One shuold have a look at Google-Watch [google-watch.org] (tinfoil? maybe...) but they have some good points:
According to DEA, Google is breaking the law [google-watch.org]
Google Evil cookie [google-watch.org]
We got your number! [google-watch.org]
And so on...
Not to troll but rather a thought. Mod as you wish.
Re:It's worth mentioning... (Score:3, Informative)
which states
big but far from complete. (Score:5, Informative)
Size and Criteria are good, but... (Score:5, Insightful)
Too bad the article doesn't mention how google is trying to fight gaming the PageRank system [google-watch.org] or any of the other problems like commercials in the results. Still a great search tool though.
Image search: What's your experiences? (Score:5, Interesting)
What are your experiences?
Of course, none of these services search in the image data itself. They search filenames, special features (like image size), and the content of the pages they are found in.
What is the state of searching in images today? Facial recognition systems have existed for a while, but they are made for a specific purpose.
How long before we can take a picture of that piece of your IKEA furniture and find the same model in pictures of celebrity houses, Babylon 5 sets and crime scenes? Or taking a picture of that familiar-looking person walking down the street, searching for her, and remembering that she was in that "reality" series two years ago.
Mac users' image search (Score:5, Informative)
For Mac users, I recommend using Beholder [mesadynamics.com] to power your Google image search. Google's minimal UI changes notwithstanding.
(Mod +1 Self-Promotive)
META Tags (Score:3, Insightful)
Guess I better call the whaaaaambulance :-(
BTW - can you believe that a large number of visitors we get come from people who do a search on "goofball.com" [google.com]. Wow.
Number One (Score:3, Interesting)
The upgrade has been quite good to me! Before the upgrade a search for my name would rank my website [michael-forman.com] many pages down and then only secondary links not the root site. Now I rank number one! It looks like all my slashdot posting has finally paid off.
Ahh. The small victories of the computer geek.
Michael. [michael-forman.com]
The real innovation is... (Score:5, Funny)
PNG! (Score:5, Interesting)
They didn't mention PNG [libpng.org], the turbo-studly image format which Google Image Search does indeed support.
It seems they used to have very few PNGs in their database, but now a search for +a filetype:png [google.com] returns 700,000 results!
sco fell in "litigious bastard" search. (Score:3, Informative)
adsense is making sense (Score:4, Interesting)
Google's adsense service https://www.google.com/adsense/overview
is certainly a winner
The ads presented are similar to the paid ads shown on a standard google search but using the keywords of the page displayed and also tailored to the country of the viewer via their ip address.
In this way webmasters can maximize the global potential of their website.
We have some very highly ranked pages (i.e. top 10) but for UK only content. Now our visitors who find us via search engines and discover we aren't quite what they want are presented with a relevant exit strategy and we get a commission!
We're getting an average 1.7% click through rate which is translating into a nice tidy sum.
go google! keep kicking MSN's dirty butt
You searched for (Score:3, Funny)
God, google sucks nowadays.
Worried about reliability (Score:3, Interesting)
For those of you who were wondering/complaining... (Score:3, Informative)
It's also interesting to note that both have a copyright date of 2004, which would imply that Google has found just under 1 billion websites in a month and a half, which seems like an interesting fact.
Re:Marching In Step (Score:5, Interesting)
And the press release doesn't say that they're indexing over 6B pages, so anyone who's saying that here is mistaken.
Re:And let's hope it stays that way! (Score:4, Funny)
Whatever happened to The Onion?
Re:Going Public & Pay Per Search (Score:3, Interesting)
I highly doubt that. I'd no longer use Google, and I bet a lot of others wouldn't either. Free is pretty addictive, even if they do have a lot of stuff indexed.