Google's Search Appliance 250
An anonymous reader noted that Google is working on a Search Engine
that you can install behind your corporate firewall for indexing
your internal documents. It's a bit thin on information, but it
looks like for as little (cough) as $20k, you can have your own
google box. Not for everyone obviously ;)
Oh now come on (Score:5, Funny)
Possibly very good... (Score:5, Interesting)
Aside from anything else, it gives Google a revenue stream so they can continue to provide their services (web, image and usenet searches) for free; they need to find a valid business model, and hopefully this can contribute.
Re:Possibly very good... (Score:2, Interesting)
The selling point for them:
As a governmental organization, regulations stipulate they must be able to provide online content to the RCMP upon request, so it must be hosted on-site. As I'm sure most corporations have similar guidelines, this could be a big cash cow for google at some point.
Google's top notch search technology, now on-site? Sign me up!
uberman
Re:Possibly very good... (Score:2)
Google's "sponsored links" seem like a valid business model to me. Search on something generic like computers [google.com] and you'll see pastel links pop up with advertisements. I imagine people pay a nice chunk of change for those.
Re:Possibly very good... (Score:5, Insightful)
Google runs on two business models: the Sponsored Links model (and the Google Sponsored Links are much more effective than any other online advertising out there) and the sale of search services (to Yahoo!, Washington Post, et al).
Fact is, Google's already profitable. Why? Because they didn't make the moronic mistakes that the other dot-coms did. Have you seen a Google Super Bowl ad? Have you seen a Google ad anywhere? Exactly. The Google model is, quite simply, you run a lean and mean ship that gets the job done well, and you make money.
Re:Possibly very good... (Score:2)
I guess you have some data to back that up? Why are googles ads better than others? Because they annoy you less? When's the last time _you_ clicked on a google sponsor because of their compelling attraction.
> Fact is, Google's already profitable.
I guess you know that from their public financial statements right? (sarcasm) Or maybe because you'r on the board? Hmm, didn't think so.
So, aside from being a google fan-boy (of which I am one myself), where to you get these wonderfully objective facts?
Re:Possibly very good... (Score:5, Interesting)
Re:Possibly very good... (Score:3, Insightful)
Google's ads tend to be relevant to what I'm searching for, so I click on them often.
Last summer I looked up filk music after seeing something about a "space-themed filk concert featuring Kathy Mar and..." at Stanford the day before the Mars Society convention. I searched for filk, and there was an ad to download some of Kathy Mar's music from mp3.com! [mp3s.com] I listened to what mp3.com had and then went to the concert. During the concert, I met Kathy and also met the guy who put the ad up.
Oh, did you mean "What was the last time I bought something through Google adwords"? I haven't yet, but I am now a filk fan and plan to buy Prometeus Music's Space CD [prometheus-music.com] when it comes out. (Kathy's CD, which I didn't buy, is also a Prometheus CD.)
I also ran $50 worth of ads for my non-revenue-generating bookmarklets site because I thought it would be a cool way to give Google money. I don't know how many people run ads without the intent of making money, though.
Re:Possibly very good... (Score:2)
Re:Possibly very good... (Score:2)
Re:Possibly very good... (Score:2)
IIRC
Just my worthless
Re:Possibly very good... (Score:2)
I want
Google enters this market at the right time (Score:4, Insightful)
Re:Google enters this market at the right time (Score:2, Interesting)
Three years ago I was involved in impelementing a similar box, from Excalibur Technologies, for the company I was working for during my university gap year (it was there that I first start reading
Judging from the website Google clearly have some fantastic technology, and they certainly have the reputation, they should do very well.
Re:Google enters this market at the right time (Score:2, Flamebait)
Re:Google enters this market at the right time (Score:3, Informative)
Google searches .doc files.
http://www.google.com/help/faq_filetypes.html [google.com]
1. What file types are returned in a Google search? There are 12 main file types searched by Google in addition to standard web formatted documents in HTML. The most common formats are PDF, PostScript, Microsoft Office formats:
Adobe Portable Document Format (pdf)
Adobe PostScript (ps)
Lotus 1-2-3 (wk1, wk2, wk3, wk4, wk5, wki, wks, wku)
Lotus WordPro (lwp)
MacWrite (mw)
Microsoft Excel (xls)
Microsoft PowerPoint (ppt)
Microsoft Word (doc)
Microsoft Works (wks, wps, wdb)
Microsoft Write (wri)
Rich Text Format (rtf)
Text (ans, txt) ~jeff
Re:Google enters this market at the right time (Score:2)
Re:Google enters this market at the right time (Score:2, Informative)
Re:Google enters this market at the right time (Score:2)
Re:Google enters this market at the right time (Score:2)
Re:Google enters this market at the right time (Score:2)
Re:Google enters this market at the right time (Score:2)
quite a bit late actually (Score:2)
hmm. (Score:5, Funny)
Searched the intranet for 'herbal viagra'.
Results 1-10 of about 1,279,500. Search took 0.14 seconds.
Splendid! (Score:4, Interesting)
As hardware continues to get cheaper and software more expensive as it gets more complex it makes sense to do this rather than trying to configure multiple applications all on the same server.
And good luck to google making money on this so they can keep their search engine fast and free of annoying advertisments.
Re:Splendid! (Score:2)
Now, there have been a few notable exceptions, and these are only the ones where the value of the software far exceeds that of the hardware needed to run it. This googlebox sounds like one of them. Another PC-based Internet appliance that is almost worth the $$$$ is Cobalt's Qube and Raq products - I wouldn't buy one myself because I know how to set up all that stuff w/o a pretty web UI, but I've heard great things from people who have purchased them.
It's just too easy to get ripped off buying these appliances.
Looking for a good internal search engine (Score:3, Interesting)
I would like to find a search engine that will index:
Does anybody have any recommendations?
Re:Looking for a good internal search engine (Score:3, Informative)
Brilliant search engine. It has parser for most file-formats (You can use pdf2txt to index your pdf-files). It even indexes your mp3's if you should happen to have some on your local net.
Free (at least as in beer) for Unix. Binaries for Windows costs between $99 and $699.
Re:Looking for a good internal search engine (Score:5, Informative)
Can too (Score:3, Insightful)
Re:Can too (Score:2)
Very true. However, try convincing the average corporate bean counter. So, instead install "htDig" and actually show that you can make $20K, with a search engine on the intranet. Once the people who use and need it are "hooked", you can proceed to getting Google (after all you should have supported software for "mission critical" functions, and you are much too important to administer htDig :-))
Re:Looking for a good internal search engine (Score:2, Informative)
"Not as good as Google,"
OK, fair enough. Have some suggestions for how to improve it? Unlike Google, you can tailor all the search weightings in ht://Dig.
Either general suggestions like "titles should be weighted more" or parameter changes would be quite welcome.
It's open source, it's yours. So don't you want to see it improve?
-Geoff
Re:Looking for a good internal search engine (Score:2, Informative)
It's a good solution for a small to medium sized website. If you run Linux, it might be on your install CD's, or might be installed already.
Re:Looking for a good internal search engine (Score:2)
I can imagine this wouldn't be a tough task if you created a modified 'locate' command in perl with an updated updatedb script that would check for text files (cat those - store results in SQL database), strip html docs off tags (SQL those results), pdf2txt your pdf files and just store the names of binaries, heck you could even run "strings" on binaries if you were so inclined and store the results.
Of course this would be much more disk and processor intensive than your typical updatedb so you might only run it say, once a month, or once every 2 weeks. But it could be a real life saver. The best thing todo would be to have one SQL server, with a cgi frontend, so you could just goto your webserver on your internal network, type in your query, and the engine would tell you on what machine in what directory you could find the document. I'm actually considering writing this now unless someone else has already done it, please reply if you know of a similar or identical system.
Why Google Can Be So Expensive... (Score:5, Insightful)
Re:Why Google Can Be So Expensive... (Score:2, Funny)
Oh no! By declaring Google an "Internet-Success-Story" you doomed them! They gonna go bankrupt in 3 month or less!
Expensive? Ha! (Score:2)
Check out prices for Inktomi [inktomi.com]. Of course the more documents you have, the lower the per-document cost, but still they charge $7500 for 10k documents.
The "average" price of a Verity K2 license is $200k. (check this itworld.com link [itworld.com].
Good content indexing is expensive. Google will be undercutting the competition with this release. $20k really is a bargain.
Re:Why Google Can Be So Expensive... (Score:4, Informative)
$20K Isn't really that much if you consider it. (Score:5, Insightful)
Re:$20K Isn't really that much if you consider it. (Score:3, Interesting)
Re:$20K Isn't really that much if you consider it. (Score:2)
Re:$20K Isn't really that much if you consider it. (Score:2)
The product comes in two versions; one that sells for $20,000 and scales to search up to 150,000 documents and a more powerful version for $250,000, which Google says can scan "millions and millions" of documents.
But that supposedly includes hardware so it still sounds like a good deal.
Re:$20K Isn't really that much if you consider it. (Score:2)
They already do. If you go to www.google.com [google.com] they'll let you search the web for free.
article from C|Net here: (Score:4, Informative)
It's a little more indepth than the India times article.
Is this new? (Score:2, Interesting)
Ouch. Try HTDIG. (Score:3, Informative)
Re:Ouch. Try HTDIG. (Score:3, Informative)
Granted, we can't implement Google's patented things, but that's not to say we don't come close.
Indexing the text of links to documents? Yes.
http://www.htdig.org/attrs.html#description_fac
Keeping track of the weight of links pointing to a document? Yes.
http://www.htdig.org/attrs.html#backlink_factor
Probably the big "missing link" is a proximity weighting. Interested? Help is always welcome!
-Geoff
Quick Indexing (Score:2, Insightful)
Corporate search engines (Score:2, Interesting)
Excite, Altavista, HotBot, Lycos all at one time or another tried to sell to the corporate market with little success. So either things have changed since, or Google management repeating an old mistake from other companies...
Moreover, companies such as Verity which specialize in corporate search engines have reported falling revenues as of late...
Re:Corporate search engines (Score:2)
We're using it here...it rocks! (Score:4, Informative)
Two thumbs up!!
Re:We're using it here...it rocks! (Score:2)
Re:We're using it here...it rocks! (Score:2)
Re:We're using it here...it rocks! (Score:2)
Cheaper to beef up... (Score:2, Interesting)
In this climate of IT layoffs, I reckon it would prove cheaper and better to hire a programmer to take the GPL'ed ht://dig code and hack in some Google-like improvements.
The major improvement needed is the ability to search on phrases, and to do boolean searches.
Such a beefed up search/indexing system would not be subject to licensing fees, and would be freely redistributable (say, to other company offices).
Heard of ht://Dig before? Any good? (Score:2)
Aside from the GNU license and association with SourceForge, I'm not sure what advantages ht://Dig has over the other free/commercial indexing products. Perhaps somebody has a comparison page?
Re:Cheaper to beef up... (Score:2, Informative)
Keep in mind, though, that ht://Dig already implements many "Google-like" features such as indexing the text of links to documents and keeping track of the backlink count.
http://www.htdig.org/attrs.html#backlink_factor
http://www.htdig.org/attrs.html#description_fac
A proximity weighting would be nice, but there's some work to be done before that.
-Geoff
Hey, maybe slashdot can get this... (Score:5, Funny)
The GPL (and Go Google!) (Score:4, Insightful)
Unless Google reimplemented their own operating system, or <shudder> ported it to Win2K, they have a very expensive product, that runs on Linux, that is not GPL.
More power to Google--I'm glad to see them finding a way to make money without trashing their search engine, like happened with the previously good search engines that came before (e.g. Altavista, Lycos).
Set your watches (Score:2)
Free is wonderful, but free doesn't scale when it comes to indexing the majority of the internet.
Re:Set your watches (Score:2)
So, if they're now profitable (actually, for the last 2 quaters), why should they charge money now? where's the logic?
Another issue that someone mentioned here - Yes, Alta vista and other companies did try to sell their search engines and have fallen - but google got 2 points:
1. They're number 1 in search on the net.
2. Dead easy setup - plug the machines, give IP, and open your browser - from there you just have to setup where to get the data from and let the machines do the job. Nothing more...
I wish good Luck for google - I always use it (gg: in konqueror)..
How will page rank work on a corp site? (Score:3, Interesting)
Why not? (Score:2)
Companies have private jets so the pres / vp can get wasted while traveling across the country - $20k is nothing.
Google roxxor!
Document management (Score:4, Interesting)
This has a LOT more business application that appears on the surface. And $20K for such a solution is comparable to paying $50 for Red Hat to run a server.
Back in my systems integration days, we had very many law firm clients who used document management to organize the truly prodigious quantity of information they had to deal with. Spending $50K on the solution was not unheard of even among small firms. In fact, they usually wound up spending $20K just on third party maintenance utilities to support their document management systems!
Didn't we know this all along? (Score:5, Interesting)
Isn't this just confirming what we already knew?
On top of that, depending on the size of your intranet and how efficient/inefficient indexing already has been, $20K may be a bargain.
Of course, how many companies are really going to have a use for it? For giggles, lets say the entire Fortune 500. That's 500 * 20K = 10,000 K = 10 Million Dollars US. In the grand scheme of things, that's a lot of money, but not a LOT of money. Perhaps they'll add on pay-per-use functions for even ritzier search features?
Sigs? We don't need no goddamn sigs!
Re:Didn't we know this all along? (Score:3, Insightful)
Re:Didn't we know this all along? (Score:3, Informative)
If you read the entire article you would know that there are two versions for sale, one small $20k box which can index up to 150,000 documents, and one "millions of millions" version which costs $250k.
If a large company puts out all the revisions of all their documents it will be quite a lot of documents :). $250k is still quite cheap for something that will index all electronic documents the company has ever produced.
Like infoseek.... (Score:3, Interesting)
20k, Isnt bad at all if your talking some serious indexing. We indexed 5, F500 compaines techincal documents at the time, before they were all in house, this was 97-98. It was slick, I often wondered what happened to that software package.
Anyone know what google is written in ? I decompiled a fair bit of Infoseeks just to see what was what, and because I could
Rather have a WayBack machine! (Score:3, Interesting)
Wouldn't it be great for when they say "your code doesn't meet the specification of what the product needs to do" and you can use it to say "let's look to the wayback machine to see when you changed the spec but didn't bother telling me"
:-)
Wish them all the best (Score:2)
Why does google get a slashdot-patent-pass? (Score:5, Interesting)
slashdot [slashdot.org] talked about this in 1999 when the patent came up. Its 2+ years later now. google has mostly crushed the competing search engines because the results of their algorithm are preferred to other algorithms. Their revenue sources are not public, but I believe I read recently that half of their revenue is from advertisements and half from technology licensing.
So, the point for discussion...
The world's favorite search engine exists because of its software patent. This patent has caused great harm to the competing search engines. Is this ok because...
Re:Why does google get a slashdot-patent-pass? (Score:3, Insightful)
Philosophically, however, I'd imagine that parsing/indexing patents are far more legitimate in many people's eyes, than say, one click purchasing patents.
ostiguy
Re:Why does google get a slashdot-patent-pass? (Score:4, Interesting)
Tremendous excuse? I'd say its a future model for all businesses.
Forget the tedious absolutism of the neosocialists -- that model will never be implemented anywhere (except at the barrel of a gun), and anyone who won't be happy until they get there will never be satisified. However, a company that does a good job at what they do and produces something that they can either give away or appear to give away something without doing the annoying, evil greedy things that other companies do should be the benchmark.
For example, Mercedes Benz -- what if they still sold their really expensive cars to rich guys who would pay for them BUT they would also sell a car that went 200,000 miles without major service for $10k?
I think the list goes on -- subsidize basic, honest products and services with expensive stuff that others are willing and able to pay for. It makes you a saint. I don't see why so many other businesses hold onto the "rape everyone" philosophy.
Nobody's perfect (Score:3, Interesting)
As far as I know, Google has never filed for frivolous "IP" lawsuits, they respect web standards, they provide gratis, decent service, they don't fuck with your browser, and they tell you who paid for word placement as opposed to just putting paying advertisers on top without mention. They also happen to use free software and give it good press.
Re:Why does google get a slashdot-patent-pass? (Score:5, Insightful)
I agree with the "many are silly, but this one is worthwhile". Google's approach was non-obvious, innovative, and really advanced the state of the art. It wasn't just another "do what we did before, but with a computer this time" patent.
I'll admit that it helps that their site is non-painful to use, but that's just gravy. Google's search is so much better that even if their site was a pain, it would still be a worthwhile search tool.
Re:Why does google get a slashdot-patent-pass? (Score:2, Interesting)
If it was too complex to use for the average computer user to pull the data they need I doubt they could stay profitable. Currently its the best, not only for the results, but how the end user interacts with their system.
Its amazing how often the "I'm Feeling Lucky" button gets exactly what your looking for.
Re:Why does google get a slashdot-patent-pass? (Score:3, Interesting)
Since the "state of the art" advances more quickly in CS than it does in most areas, should we expect Google to place its original patent in the public domain after several years? Or do you think that in several years, someone will invent a completely different algorithm that yields better search results, rendering Google's patent obsolete?
Re:Why does google get a slashdot-patent-pass? (Score:2, Insightful)
This is only alright for google, because the average joe slashdot user doesn't have to pay anything to use their services. (proving further that it's all about the "free beer").
Look at the
Re:Why does google get a slashdot-patent-pass? (Score:2)
The unwashed mass of portal-shopping-news-flowers-and-oh-yeah-searchin
cheers,
mike
Re:Why does google get a slashdot-patent-pass? (Score:2, Insightful)
Re:Excuse me.... (Score:2)
There is also the issue of patenting mathematics. That is not allowed. Many software patents are really patents on a machine wink wink that happens to produce the same results as a mathematical formula.
And I can't tell if you woke up in a socialist country or not. I woke up in one that is nominally capitalistic, but more socialist for the lowest castes.
$20k ought to be enough... (Score:2)
Re:$20k ought to be enough... (Score:2)
Why? (Score:2, Interesting)
There a number of other existing indexing engines that are signigiantly cheaper and more mature. Google should stick to what it does best. I guess this shows they aren't very profitable and are looking for other sources of revenue.
We *seriously* need this. (Score:2)
$20k is nothing to shell out[1] for the capabilities that Google has.
[1] In corporate terms.
Corporate Intranet Index Engines? (Score:2)
You actually got results returned from your search server?
Lucky bastard. Our corporate Intranet search engine usually would just return 'Query Timed out'. Eventually they just took the search boxes off all the web pages.
I've since built a simple Harvest [ed.ac.uk] index for the Intranet.
It can be very interesting finding all of the 'cobweb' documents on intranet sites. Ancient documents relating to projects and managers long since vanished among other stuff that management would prefer to see forgotten...
There are some cool features that are unique to Google, but I'm not sure if 'Convert PDF to HTML' and 'highlight search terms' are worth $20K.
Open source, right? (Score:3, Interesting)
Right now Google tends to be among the bigger darlings of Slashdot, but will they remain that way if they release this product and it's not Open Source? 'Cause they're nuts if they're planning on charging $20K for it but making it Open Source. Are they traitors to the cause, or is it just another understandable case of "Money talks, bullshit walks" when it comes to Open Source and the Real World?
It will definitely work great, but ... (Score:2, Insightful)
Now, when you're indexing thousands of doc and pdf files on a company network, how many of those link to each other?
And how many companies have internal newsgroups that can be searched? (No, Exchange shared folders don't count - or can Google index those as well?)
Just use the Windows "Find Fast" feature! (Score:2)
*cough*
(Please think about it before you roast me.)
Theoretically, no... (Score:2)
The joke was about Fast Find though which, IMO, is the most crufty unfriendly piece of sh*t ever incorporated into MS Office. In Office 95, 97, and 2000 (haven't tried Office XP yet) it's something I systematically eradicate on every machine I see. It's known for firing up it's re-indexing while the user is already using the machine, and it's also known for not being controllable by the user (i.e. the user can't tell it when to re-index).
google's cheap (Score:2, Insightful)
having something for $20000 or so is a godsend, especially if it comes with its own hardware (even though its hardware is probably not as nice as an e220)... throw in that they'll probably do the work when it breaks, and this is a no-brainer for anyone needing to index even as few as 25000 pages.
Could Save Significant Time, Effort (Score:2)
Re:Advertising?? (Score:2, Informative)
They use text banners (Score:2)
sidebar ads (Score:2)
I find it hard to believe the revenue from those is really significant, but who knows; I bet their clickthrough rates are much better than those damn popup ads.
Re:Advertising?? (Score:2, Insightful)
"Hmm...if somebody's searching for domain registration, let's offer text ads about domain registration. Then, they won't be pissed about downloaing goofy banner/javascripts and they may actually click on the ad because it *is* useful."
Almost makes sense--but then you can't shoot the monkey.
Seriously though, I've clicked on Googe ads numeorous times beause they're relevant.
It's called a sponsored link (Score:2)
This kind of directed advertising is valuable and a good application of their service.
Re:Search engine (Score:2, Funny)
#!/usr/bin/perl
use CGI;
$query=param( 'q' );
$document_root = "/home/";
print "<html><body>";
foreach `grep $query $document_root`
{
print "<li>$_</li>\n";
}
print "</body></html>";
exit(0);
Re:Search engine (Score:3, Informative)
No taint checking (What happens if 'q' contains ";rm -rf /;".
No warnings.
No proper formatting of HTML, on the output. If the grep matches "", then it's not going to display anything on netscape. You need to either strip tags, or force tag matches.
Mod This Up (Score:2)
This would be a great way to introduce a really NASTY security hole into your site by using this script.