A Search Engine For The Slower Net 309
Makarand writes "According to this BBC News
article researchers at MIT
are developing a search engine for people
using the web on slower net connections.
The software will e-mail queries to a central server and receive the most relevant
webpages from the search results by e-mail in a compressed form. Since the program is too big to download over a poor net connection
it will be mailed on CDs to libraries for people to borrow and install. They are also considering trying to persuade computer sellers
in developing countries to install the program on machines."
Them Modem Linkers (Score:4, Funny)
ain't they kinda odd?
Goin' on the net,
with they little baud.
Look at all those Modem Linkers,
what a thing to see.
Web sites come up really slow,
get's lousy Voice/IP.
Internet at low bit rates,
what a dawgon mess.
Load a web site, take a break,
while 'pache mods compress.
How to be a Modem Linker,
don't need a ticket.
Get a local ISP,
dial up and link it.
Re:Them Modem Linkers (Score:2)
FOAD, troll. Take your goatse.cx link (verified with Lynx [browser.org]) and shove it where the sun don't shine.
Re:Them Modem Linkers (Score:3, Informative)
Personally, I'm tired of "In Soviet Russia
Yes, it is highly excessive. There's probably something wrong with me. However, I'll get bored of this very soon, and I'll move on to other methods of Karma gathering.
Slow people have rights too (Score:2, Funny)
Re:Slow people have rights too (Score:2)
Anyways.. We should pick on the people with slow sites that need to move their webserver off of their momma's 14.4 and get it on something a bit faster.
A better idea (Score:5, Funny)
Re:A better idea (Score:3, Funny)
What program? (Score:2, Interesting)
Re:What program? (Score:2)
Re:What program? (Score:2)
Re:What program? (Score:4, Informative)
There are several benefits of having a TEK Client program instead of just using email. But first off, the client isn't that big -- the JAR file with the TEK classes is 125 KB. When we package it up with third-party libraries and an installer, it comes to 2 MB, and with Java included, it's 10 MB. It would be interesting to try to prune down this distribution to the minimal size -- for the prototype version, we have focussed primarily on the software's functionality.
The TEK Client program is useful because it provides a seamless interface to browsing the downloaded pages. It operates as a web proxy: users adjust their browser to talk to TEK instead of the web, and then they can view pages just as if they were connected. The URL's appear as usual in the browser's "location" toolbar, and links on the page are functional. If a URL has been downloaded before, then it is loaded out of the local cache; if it has not yet been downloaded, then the user is queried to submit a request for that URL.
The TEK Client includes a local search utility for searching the cache of downloaded pages. In this way, the user can build up a local library of information that is relevant to their community; for example, in a school setting, many searches could be satisfied using only the local cache due to overlapping interests of students.
Also, the TEK Client is useful for tracking searches. In settings where connectivity is intermittent, searches can be enqueued during the day and sent at night (or when a connection is available.) The client also provides basic user management so that multiple people can share a public installation (perhaps using a single email address, which they might not own themselves) and still keep track of their own queries.
In the future, we think there are a lot of features that could be added to the client. For instance, we could seed the client with other open-source resources, such as an atlas or encyclopedia, that could be used in conjunction with web searches. There could also be an "intelligent query builder" that helps construct Internet searches (for example, by checking spelling) before going through the time and expense of connecting and sending them off.
Many more details about TEK are available from the TEK Homepage [sourceforge.net]. We are currently moving our CVS source tree to SourceForge, so if you're interested in helping to improve the software, it'd be great to hear from you!
Perfect for those people... (Score:3, Funny)
Re:Perfect for those people... (Score:2)
Hmmm. (Score:5, Funny)
http://www.mshiltonj.com/sr/
Re:Hmmm. (Score:3, Insightful)
Now, take a good hard look at your cubemate. You know what they do when they get home... and it's not BF1942... probably gives you good insight into their test bench naming conventions...
Re:Hmmm. (Score:2)
You have to share a cube with someone? And it's most likely they spend their nights downloading porn? I really don't want to ever end up working wherever you work.
On a more seriously joking matter, where the fuck are these people? I've worked in quite a few places and I've only met a few people with the neon signs hov
Re:Hmmm. (Score:2)
Re:Hmmm. (Score:3, Funny)
Ah, the time when we got USENET access... All those hours sitting and waiting for the 'You have new mail.' message was over and porn could be reached almost instantly at alt.pictures.celebs[*].
--H
[*] Name may not be accurate.
Cached searches (Score:5, Interesting)
This would be a good way to preserve stuff that may be the subject of removal due to court order, like xenu.net and other similar de-Googlings.
Hey! (Score:3, Funny)
And don't say it isn't doable... If I had the time, I could do it, and I'm a mere highschool graduate...
Re:Hey! (Score:2)
Yeah, but those Tivo'd American Idols aren't watching themselves, are they?
Re:Hey! (Score:2)
Re:Hey! (Score:2)
SHHH!!! Don't tell... I was planning on making millions! You damn near let the cat outta the bag... now how much do I have to pay you to keep you quiet?
Re:Hey! (Score:2)
Spoken like somebody who doesn't have a clue what he's talking about. I'm sure there'd be a fair chunk of money involved if somebody manages to deliver substantially better than 2:1 lossless compression on text, photos, and/or sound...hell, it'd probably be good for a Turing Award, if it's verifiable. It's something that the
Re:Hey! (Score:3, Informative)
As for video, animated PNG is a PNG compression of the Diff's of the second frame to the first, third to the second, etc etc.. In the case of video, compression ratios are on the order of 100:1 and audio is
Re:Hey! (Score:2)
Well, herein lies the problem. Compression on modems (now the V.90 and V.92 standards) must satisfy three things: a) low-latency; b) low-processor demand; c) linear. A modem can only look at a very little snippet of data (usually what's in the FIFO buffer) and compress that. Even when it compresses it, it has to do so quickly and serially. The reason compressi
The big problem (Score:2)
Great!! I so need that... (Score:3, Interesting)
Yes, I am dead serious... Lets just say Charter's cable Internet in my area lately really stinks. I would almost rather be on a 14.4k modem - no joke. I am not the only user... I get lag spikes of over 3000ms when not doing anything, and almost dropped connections. Good thing DSL recently became available in my area =D. One less Charter Pipeline subscriber.
In other news... (Score:2, Funny)
File transfers and weather forecasts are planned in 2006.
This will make a difference.
RTFA (Score:5, Informative)
The program doesn't e-mail back with a mere mirror of a google / yahoo results page. It actually filters through the individual results compressing the entire page. e.g. my search turns up a CNN page and a blurb on MSNBC and I get, e-mailed to me, compressed versions of those actual sites, not just links to them.
As far the "my 28.8 modem is just fast enough" crowd -- read the article! Some of these locations the software is being developed for don't even have access to a phone line on a regular basis. And the lines they do have access to are more likely than not to be noisy as hell and not able to support a 28.8 connection.
Copyright Problems (Score:2)
I guess I should first mention the obvious that putting a bunch of other people's copyrighted work on a CD Rom, is the type of thing that gets people hauled before courts.
As for compression, if you are using compression on the modem connection, then you don't really save any time by trying to compress the data again. You might save some space, but my experience is that a zip file
A Whole Country? (Score:2)
They are going to develop countries to install the program on people's machines?
Still in use today in the US. (Score:5, Insightful)
I am reminded of the Prepaid Legal system of doing business. You call up and ask a question, and the next day, an attorney familiar with the area you are asking about calls you back to answer your questions and advise you. So maybe this isn't all that outdated of an idea after
IN REGARD TO THE SYSTEM IN THE ARTICLE:
To have this capability back in 1973 would have been unbelievable. In 1983, to have this available to every library in the US would have been an unbelievable achievement. To have it now is so slow that I start to go google eyed even thinking about it.
BUT
This is great for countries that are 20-30 years behind in technology. It will revolutionize the search for information for areas that are not as connected as the US.
What it's really about (Score:3, Insightful)
Using the phone in a country like Malawi can be a real adventure. It's not like the US at all.
Searching over email?? (Score:2)
Also good to circumvent censorship (Score:5, Insightful)
and for those with no internet.... (Score:3, Funny)
Wouldn't be needed if... (Score:4, Insightful)
...only webdesigners had not collaborated to turn the web into the graphics orgy it is today. I mean, have these kids coming out of graphics school even browsed the relevant w3c specifications?
News Flash !
Re:Wouldn't be needed if... (Score:2)
Notice that a *lot* of sites out there that are made by professionals for themselves or for open projects follow your guidelines. It is the commercial sites that don't (and the kiddie sites, of course, where you can't see the site for all the flash animations and counters).
Sad thing is tha
Re:Wouldn't be needed if... (Score:2)
Try telling that to a manager. Just try it. Or, if you wish, I can tell you what the response will be - in the absolute best case scenario, perfect world utopia, you will get a blank stare. From there it is all downhill.
Exactly. When I was doing web design, for one customer I designed a beautiful, XHTML Strict site with incredibly low bandwidth requirements. The customer didn't think it was "snazzy" enough, and eventually had a graphic designer do a page design that was so agonizingly slow that I can't
Re:Wouldn't be needed if... (Score:2)
And WTF is the point of all this computing power if its just to display boring text? For the love of GAWD even BOOKS have pictures!
BOOKS!
Re:Wouldn't be needed if... (Score:3, Insightful)
Why shouldn't the internet (by which, I assume, you mean specifically the World Wide Web) be for both information AND television?
Just because the markup language we call "HTML" was originally developed and is best suited for information-rich text documents such as academic papers a decade ago doesn't mean that we must not, or even should not, look beyond that type of content and find new uses for the system.
Compression, NO!!! (Score:2)
Re:Compression, NO!!! (Score:2)
Yes, it will, especially if it's application-specific compression. CSLIP compression isn't particulary good (for good reason, it's supposed to be simple and efficient).
Good point on uuencode, though. A compression program designed to produce 7-bit rather than 8-bit output to avoid this might be a win.
Here's a name for it (Score:2)
uhhh...archie? (Score:2)
Re:uhhh...archie? (Score:2, Insightful)
What I wonder is why the *client* needs any software? Why not just make an email addy that people send queries to (like you did with "archie") and get the results back in whatever mailer you've got already?
Chris
Free emulator (Score:3, Funny)
Google Voice Search (Score:5, Interesting)
To try out this demo, please follow these simple steps:
1. Pick up the phone and call the automated voice search system at (650) 318-0165.
2. After the prompt Say your Search Keywords, say your query to the system.
3. Click this link and a new window will open with your voice search results.
4. Say another query, and the new window with the search results will be updated with the new results.
Re:Google Voice Search (Score:2)
Other needs (Score:3, Insightful)
First, for those of you saying 'Google is fast enough even on a 14.4K' - think school with one phone line, perhaps not even available during the day. Or how about connections via satelite phone at $$/min? Suddenly you want super efficient, when you only earn 5 bucks a day.
As to what else this needs, the search engine needs to strip out all the crap before emailing a web page to you (Java, Flash, etc) - should focus on mostly text, small pictures only. Particulary since 486's would be a common platform for people using this, so the search engine better work well on one. You also should be able to strip out all pictures as an option to maximise text info download - remember turning off pictures in Netscape 2.x to speed up your browsing? If you need something it striped out, you should be able to query just for the bits you need later.
Also the ability to share your cache between computers would be huge if they can't have a server to do that for them. At any rate, means of transferring those precious pages you downloaded to another computer - on a floppy, unless you have local email.
Re:Other needs (Score:2)
The ability to not download text only already exists. We used lynx to look at pages and downloaded images only when we thought they were relevant - and killed the download as soon as we saw enough to either understand the picture or realize it was not what we wanted to see.
Also the ability to share your cache between computers would be huge if they can't have a server to do that for the
Re:Other needs (Score:4, Insightful)
No, google is fast enough at 300Baud. Damn, but folks are young around here.
As to what else this needs, the search engine needs to strip out all the crap before emailing a web page to you (Java, Flash, etc) - should focus on mostly text, small pictures only.
Either configure your browser or proxy to do that. Easy.
Particulary since 486's would be a common platform for people using this, so the search engine better work well on one.
Give me a break. 486's are plenty powerful enough for web browsing. Even with pictures.
You also should be able to strip out all pictures as an option to maximise text info download - remember turning off pictures in Netscape 2.x to speed up your browsing? If you need something it striped out, you should be able to query just for the bits you need later.
[sarcasm on] Really? [/sarcasm]
Also the ability to share your cache between computers would be huge if they can't have a server to do that for them. At any rate, means of transferring those precious pages you downloaded to another computer - on a floppy, unless you have local email
Give me a frickin' break. PPP over null modem serial.
This has got to be one of the worst ideas I've ever heard of. Hell, I knew of WWW via UUCP (that's email, kids) in the 90's - and that didn't require ANY "special search software."
I don't understand (Score:2)
already been done(kind of) (Score:2)
You could search yahoo by requesting a url like http://search.yahoo.com/bin/search?p=search+terms
Re:already been done(kind of) (Score:4, Informative)
You're right that retrieving web pages over email has already been done. A present-day service that works as you describe is www4mail [www4mail.org], and I know people that use it regularly from low-connectivity regions.
However, the TEK system (which I'm involved in) offers several benefits over a purely email-based solution. By having a web proxy on the client side, users can use their favorite browser to view downloaded pages, complete with color and formatting, which is often absent in text-only systems. Moreover, the client keeps a local, searchable cache of all downloaded pages, and the server keeps track of which pages have been sent to avoid wasting bandwidth on duplicate content. Finally, with a web-like user interface, many users can share a single e-mail account in a public kiosk or school.
Many more details about the TEK system are available from the TEK Homepage [sourceforge.net]
The Slow Evolutionary Crawl of Archie (Score:2, Insightful)
So what if it scans webpages instead of FTP sites. It's not that big of a leap.
Why not go all the way? (Score:4, Funny)
Software downloads... (Score:2)
Having webpages mailed to me seems stupid because I have high-speed internet at work and if there is a bandwith intensive site I just load it at work the next day...
This just in... (Score:3, Funny)
"MIT Reinvents Archie service from the early 90's."
I'm On 28.8--This Makes No Sense (Score:2, Informative)
This doesn't make any sense to me. I'm on 28.8, and 20 results from Google still come up instantly. Bandwidth might be an issue for the linked pages, but certainly not the search results. Even when I was on 14.4, back when Yahoo! was the hot search engine, it was no problem.
So, what if these guys are on 300 baud and they get compressed search results via... e-mail??? The delay waiting for results to navigate e-mail systems probably negates the savings from the compression. Why not send compressed res
Good idea but... (Score:4, Interesting)
Where did all the sites go that you could use wget -r to grab overnight? How about the odd few that used to offer a
Content over presentation is a concept that needs to be reintroduced to the net, preferably with a stick.
Prior Art ;-) (Score:2, Informative)
Have fun.
We don't need no stinkin' slow net (Score:2)
paper mail? (Score:2)
Fine tuning (Score:2)
For me, I usually go through several revisions of my Google (or other) searches before I either hit on what I'm looking for or realize that it's not out there.
If I had to wait a day for each search query to come back, it'd take me a few weeks to do what I can accomplish in 10 minutes. Yeah, I know it's better than nothing, but "fine tuning" your query is a big part of what makes a search useful.
There's already something like this. (Score:2)
Why is it so large? (Score:2)
Already been done. (Score:2)
The Internet Oracle [indiana.edu] always provides the best answer(s) to your questions.
Internet by mail (Score:3, Informative)
The above document explain how to access:
FTP
ARCHIE (deprecated)
FTPSEARCH (deprecated)
GOPHER (deprecated)
VERONICA (deprecated)
JUGHEAD (deprecated)
USENET
WWW
WWW SEARCH (using standard search engine like altavista, yahoo or google)
FINGER
WHOIS
[...]
All these protocols can be accessed via email, according to the FAQ. The FAQ has been around for a long time. This explains why many (most) involved protocols are now deprecated. I used this faq in the early '90 and I don't know how it works now. At the time, it was great. The last update is 2002/04/16.
Why a program? (Score:2)
I mean, there are already http-over-email services, that do not need any special program, you just send a mail to the service's address with the links to the stuff you want and it sends it back and you could use it perfectly this way: just send a mail with a line like 'hedgehog asia' to mail@search.com and it would send you back a reply with all the pages re
Where's the money? (Score:3, Interesting)
It's a service that's only useful for poor third-world schools. Those organizations are probably running on a donated 486. They sure don't have money to pay, or even the money to pay to download ads. Charity-wise, "fund a search engine for poor third-worlders" is somewhat less compelling than "feed a starving child".
I see this idea living on research and enthusiasm for a year or two then dying a quiet, broke death.
Google = Media Control. (Score:2)
There are a Lot Of Sites which the lords of Google have cut from their search engine. And I'm not talking porn. (The proliferation of sex-obsession is actually encouraged due to its weakening effect upon individuals.) I'm talking about anything which holds any real weight and thereby pisses off the wrong people. --Or anybody who needs to be punished, are chopped from what has become the Internet's de-facto public eye. (Google.)
Easily done, too. Make a remarkable product. (Google was a
Re:Because... (Score:2, Insightful)
Re:Because... (Score:5, Informative)
Instead, this service would package together selected results of the search, for overnight download into the PC's cache. The user can then browse through the material at their leisure without needing to use the internet connection (which is the scarce resource).
Re:Because... (Score:5, Funny)
Re:Because... (Score:2)
So now, you're going to have all the pages downloaded to your PC for you, when it's quite likely that the very first link was the one you wanted anyway?
What about the bandwidth costs of doing that? And exactly how slow are these connections anyway? Google's search page is a few KBs - I can't imagine how downloading every possible hit (say top 20 hits) is feasible where downloading a single page of a few KBs is not.
Re:Because... (Score:2)
[_______________________________]
but I'll bite anyway. I still don't think you read the article or you would have noticed the part about compression.
"Someone using the software would e-mail a query to a central server in Boston. The program would search the net, choose the most suitable webpages, compress them and e-mail the results a day later."
Its very likely, that since the target is to use this for information, that the pages would be _h
Re:Because... (Score:3, Funny)
[_______________________________]
Ok, you are obviously someone incapable of making a point without resorting to childish insults. In my experience this usually correlates well with inferior mental capacity so I would encourage you to read the below slowly:
Downloading the contents of 20 pages when one page is the one you're looking for is vastly inefficient.
Its very likely, that since the target is to use this for information, that the page
Re:Because... (Score:2)
Re:Because... (Score:5, Insightful)
Why don't you just scream "HI I'M FROM 'WESTERN' CIVILIZATION AND HAVE NO IDEA HOW THESE THINGS WORK IN LESS PRIVLEGED PLACES"
Google is too slow when your school has one phone line that is used for _everything_, including net access. Not to mention the cost of using the phone anyway. This allows all the students to submit thier searches to a teacher one day, the teacher then submits the all searches with only a couple minutes of dialing up. He can retrieve the compressed results a few days later with only a minutes of dialing up. Now go read the article. Someone needs to mod that post down, hopefully the poster can redeem themselves later in the thread with something insightful.
Re:Because... (Score:2)
Huhh?? How? If you download just the results page, then that is pretty useless since you then have to click on the links to see if it is the relevant link or not.
If on the other hand you download the actual contents of the top 20 pages then given how slow your connection is supposed to be, I don't see how you could do that in "only a few minutes".
Re:Because... (Score:2)
Re:Because... (Score:2)
You download a compressed version of the contents of the results of the search. HTML pages compress very, very well, so I'd hazard a guess that it's pretty efficient.
Go read the article. It explains a lot.
Re:Because... (Score:3, Insightful)
To "redeem myself," I'd like to make two points:
1. I was aiming for amusing with the Google thing. I decided to tack on the "real question" because I'm honest about my ignorance of the topic.
2. In what way will this search function highlight the control of relevanc
Re:Because... (Score:2, Informative)
In non developed countries the lack of bandwith is a serious problem.
A year ago I was in Moscow. After 6 days without internet I really wanted to check my e-mail(webmail).
That day we spent some time some kilometers outside Moscow, but still managed to find a internetcafe.
After waiting for 15 minutes (the place was crowded) I started "surfing".
Man that was slow.
25 computers, *sharing* a 64kb uplink.
Re:Because... (Score:2)
Do you think I spent 8 minutes downloading the images on the yahoo frontpage?
Stupid AC...
Re:Because... (Score:2)
In short, RTFA. Sorry.
Not that I'd wager that this is some kind of brilliant, revolutionary idea, but really, the article do
Re:Because... (Score:2)
Pop! (sound of a lightbulb turning on)
Re:Because... (Score:3, Informative)
Query the ftp server by email and get the directory list emailed back to you. Then you could send the command via another email which would result in the file being emailed back to you overnight ready for you to retrieve it.
And then there was "trickle" where files could be sent/refreshed to your uni's mainframe's ftp server overnight and would be there for you to play with the next morning and you would always
Re:How slow are their connections? (Score:2)
I'm sorry but google.com usually gives me what I want (to the T) on the first page of results.
This doesn't sound like something that would come out of the brains at MIT.
Re:RTFA (Score:2, Insightful)
Re:RTFA (Score:2)
Re:Uhh, google? (Score:2)
Re:Well... (Score:2)
But if one only has a 300 baud modem, what the hell are they doing on the internet anyways? That's like brining a knife to a gun fight, or some other poor analogy.
Bah to that. A very big Bah! (Score:2)
Disable graphics and google loads in no time flat. Realistically, if you can't use google with your existing tools then you can't use any links a search engine would get you.
They are also considering trying to persuade computer sellers in developing countries to install the program on machines."
Hell with that, if any software should be pre-installed, it should be stuff the bulk of the cus
Re:Repackaging of simple tools under a pretty name (Score:2)
Re:Because Google is too slow? (Score:2)
I didn't think so.
Compression (Score:2)
I guess you could optimize your time a little better if you had a program that downloaded all of the pag
You're not getting it (Score:2)
Even if what you want is among thos