Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
The Internet

A Search Engine For The Slower Net 309

Posted by timothy
from the most-everyone-most-everywhere dept.
Makarand writes "According to this BBC News article researchers at MIT are developing a search engine for people using the web on slower net connections. The software will e-mail queries to a central server and receive the most relevant webpages from the search results by e-mail in a compressed form. Since the program is too big to download over a poor net connection it will be mailed on CDs to libraries for people to borrow and install. They are also considering trying to persuade computer sellers in developing countries to install the program on machines."
This discussion has been archived. No new comments can be posted.

A Search Engine For The Slower Net

Comments Filter:
  • Re:Because... (Score:5, Informative)

    by TopShelf (92521) * on Wednesday July 16, 2003 @02:38PM (#6455303) Homepage Journal
    I had the same initial reaction, but after RTFA (I know, shame on me), it seems that the limitation isn't so much time, but continuous time hogging the phone line accessing Google, checking out pages, etc.

    Instead, this service would package together selected results of the search, for overnight download into the PC's cache. The user can then browse through the material at their leisure without needing to use the internet connection (which is the scarce resource).
  • RTFA (Score:5, Informative)

    by DrewMIT (98823) on Wednesday July 16, 2003 @02:42PM (#6455345)
    For those of you wondering why someone would do this, how about reading the damn article?

    The program doesn't e-mail back with a mere mirror of a google / yahoo results page. It actually filters through the individual results compressing the entire page. e.g. my search turns up a CNN page and a blurb on MSNBC and I get, e-mailed to me, compressed versions of those actual sites, not just links to them.

    As far the "my 28.8 modem is just fast enough" crowd -- read the article! Some of these locations the software is being developed for don't even have access to a phone line on a regular basis. And the lines they do have access to are more likely than not to be noisy as hell and not able to support a 28.8 connection.
  • Re:Because... (Score:2, Informative)

    by EinarH (583836) on Wednesday July 16, 2003 @02:54PM (#6455481) Journal
    Honestly, folks, someone please explain how this could substantially save someone time surfing the web?
    In non developed countries the lack of bandwith is a serious problem.

    A year ago I was in Moscow. After 6 days without internet I really wanted to check my e-mail(webmail).
    That day we spent some time some kilometers outside Moscow, but still managed to find a internetcafe.
    After waiting for 15 minutes (the place was crowded) I started "surfing".
    Man that was slow.
    25 computers, *sharing* a 64kb uplink. And all the locals (they had an arangement; pay x numbers of rubels and "surf" as long as you want) where downloading with IRC, Kazaa, DC and ftp which resulted in *heavy* packet loss.
    I spent 8 minutes getting the Yahoo.com frontpage. And it took me almost 20 minutes before I could read the first mail.

  • Re:Because... (Score:3, Informative)

    by advocate_one (662832) on Wednesday July 16, 2003 @03:07PM (#6455576)
    cor... this sounds oh so familiar... anyone remember ftp by email??? History repeating itself...

    Query the ftp server by email and get the directory list emailed back to you. Then you could send the command via another email which would result in the file being emailed back to you overnight ready for you to retrieve it.

    And then there was "trickle" where files could be sent/refreshed to your uni's mainframe's ftp server overnight and would be there for you to play with the next morning and you would always have the most recent version of the file as they'd have been synched via trickle
  • by istartedi (132515) on Wednesday July 16, 2003 @03:15PM (#6455649) Journal

    This doesn't make any sense to me. I'm on 28.8, and 20 results from Google still come up instantly. Bandwidth might be an issue for the linked pages, but certainly not the search results. Even when I was on 14.4, back when Yahoo! was the hot search engine, it was no problem.

    So, what if these guys are on 300 baud and they get compressed search results via... e-mail??? The delay waiting for results to navigate e-mail systems probably negates the savings from the compression. Why not send compressed results over HTTP using a web-browser like application? Of course then you are still faced with bandwidth issues on the links you follow.

    It just doesn't make sense to me, unless they write a server-side proxy that intelligently filters Flash, popups, Java, superfluous graphics, audio, and other useless stuff that "web designers" like to use. The proxy could present pages in such a way as to offer users the option of downloading blocked files when the AI fails. That just cries out for a Mozilla mod or some other kind of custom browser; certainly not an e-mail client.

  • Prior Art ;-) (Score:2, Informative)

    by sICE (92132) on Wednesday July 16, 2003 @03:19PM (#6455682) Homepage
    I wonder in what it is different from AGORA [google.com], Web-To-Email [google.com], Gopher [google.com], and such services services? If you dont know bout them, you might want to check the Accessing The Internet By E-mail -- Guide to Offline Internet Access [google.com] and Fravia's "How to search the web [searchlores.org]" lesson 10 [anticrack.de].

    Have fun.
  • by inertia@yahoo.com (156602) * on Wednesday July 16, 2003 @03:23PM (#6455711) Homepage Journal
    This is in reference to "Them Poems" by Mason Williams [google.com], circa 1960's. In and of themselves, they have little if anything to do with geekness.

    Personally, I'm tired of "In Soviet Russia ..." form of humor. You might know I've even tried it. Always a let down in the karma department.

    Yes, it is highly excessive. There's probably something wrong with me. However, I'll get bored of this very soon, and I'll move on to other methods of Karma gathering.
  • Internet by mail (Score:3, Informative)

    by Smartcowboy (679871) on Wednesday July 16, 2003 @04:06PM (#6456083)
    This FAQ [faqs.org] explains how to access most of the internet using only a standard email client.

    The above document explain how to access:

    FTP
    ARCHIE (deprecated)
    FTPSEARCH (deprecated)
    GOPHER (deprecated)
    VERONICA (deprecated)
    JUGHEAD (deprecated)
    USENET
    WWW
    WWW SEARCH (using standard search engine like altavista, yahoo or google)
    FINGER
    WHOIS
    [...]

    All these protocols can be accessed via email, according to the FAQ. The FAQ has been around for a long time. This explains why many (most) involved protocols are now deprecated. I used this faq in the early '90 and I don't know how it works now. At the time, it was great. The last update is 2002/04/16.
  • Re:Hey! (Score:3, Informative)

    by Natalie's Hot Grits (241348) on Wednesday July 16, 2003 @06:22PM (#6457209) Homepage
    Ever heard of PNG,GIF,Animated PNG/GIF? these are formats for both images and video, that are 100% lossless, and better than a 2:1 compression ratio on most video and images (High resolution images with many colors are not great compression ratio but in the average case 2:1 is very very very very doable)

    As for video, animated PNG is a PNG compression of the Diff's of the second frame to the first, third to the second, etc etc.. In the case of video, compression ratios are on the order of 100:1 and audio is usually around 2:1 for lossless (FLAC,shn)

    Huffyuv is a codec that video editors use that is lossless and better than 2:1 in the average case

    Text, for sure on the average case, is WAY WAY WAY over a 2:1 compression ratio (Ever heard of gzip,zip,bz2?). Especially considering they are zipping up code, and not written english, it is very very compressable because of so much redundancy.

    I think someone forgot to do their compression homework before posting and insulting his parent (which he should respect, for he has insight)...
  • Re:What program? (Score:4, Informative)

    by BillThies (690098) on Wednesday July 16, 2003 @09:43PM (#6458270)
    Hi, I'm a graduate student working on the TEK project.

    There are several benefits of having a TEK Client program instead of just using email. But first off, the client isn't that big -- the JAR file with the TEK classes is 125 KB. When we package it up with third-party libraries and an installer, it comes to 2 MB, and with Java included, it's 10 MB. It would be interesting to try to prune down this distribution to the minimal size -- for the prototype version, we have focussed primarily on the software's functionality.

    The TEK Client program is useful because it provides a seamless interface to browsing the downloaded pages. It operates as a web proxy: users adjust their browser to talk to TEK instead of the web, and then they can view pages just as if they were connected. The URL's appear as usual in the browser's "location" toolbar, and links on the page are functional. If a URL has been downloaded before, then it is loaded out of the local cache; if it has not yet been downloaded, then the user is queried to submit a request for that URL.

    The TEK Client includes a local search utility for searching the cache of downloaded pages. In this way, the user can build up a local library of information that is relevant to their community; for example, in a school setting, many searches could be satisfied using only the local cache due to overlapping interests of students.

    Also, the TEK Client is useful for tracking searches. In settings where connectivity is intermittent, searches can be enqueued during the day and sent at night (or when a connection is available.) The client also provides basic user management so that multiple people can share a public installation (perhaps using a single email address, which they might not own themselves) and still keep track of their own queries.

    In the future, we think there are a lot of features that could be added to the client. For instance, we could seed the client with other open-source resources, such as an atlas or encyclopedia, that could be used in conjunction with web searches. There could also be an "intelligent query builder" that helps construct Internet searches (for example, by checking spelling) before going through the time and expense of connecting and sending them off.

    Many more details about TEK are available from the TEK Homepage [sourceforge.net]. We are currently moving our CVS source tree to SourceForge, so if you're interested in helping to improve the software, it'd be great to hear from you!

  • by BillThies (690098) on Wednesday July 16, 2003 @10:21PM (#6458471)
    Hi, I'm a graduate student working on the TEK project.

    Thank you for your post, it's an important point -- TEK is targeting users that might have no direct Internet connectivity. In some places, it can be cheaper to have an email-only account instead of full-fledged web access; for these users, TEK provides web content using only email.

    In addition, there are cases where no connectivity is available, but emails can be sent in a store-and-forward fashion. For instance, we are working with First Mile Solutions [firstmilesolutions.com], who provides store-and-forward services to rural communities using a mobile access point (such as a bus) that visits each kiosk during the day. Moreover, if the connections are unreliable by any measure, then email is a better medium than HTTP, as no end-to-end connection is needed at any time.

    More information about the TEK project, including some statistics on Internet rates in the regions we are targeting, is available on the TEK Homepage [sourceforge.net]

  • by BillThies (690098) on Wednesday July 16, 2003 @10:42PM (#6458557)
    Hi, I'm a graduate student working on the TEK project.

    We agree that you won't have too much to gain from zipping the content before sending it. The larger gains are from higher-level compression; for instance, the TEK Server keeps track of each page that it sends a given user, and it is careful not to send duplicate pages in replies to future search queries (unless the user specifically requests an updated version of a given page.) This can be especially useful in shared environments (such as a school) where there is a lot of overlap between queries.

    Also, there are some marginal gains to be made by zipping more content at once. The server sends ~20 pages at a time (or all the URL's requested in a given batch), which will compress better than if they were done separately.

    Your point about the bloat from the mail program is a great one, thanks. We should look into fixing this.

    By the way, we see the primary benefit of TEK as being the email-based access rather than the compression. You can find many more details about the project on the TEK Homepage [sourceforge.net].

  • by BillThies (690098) on Wednesday July 16, 2003 @11:21PM (#6458689)

    You're right that retrieving web pages over email has already been done. A present-day service that works as you describe is www4mail [www4mail.org], and I know people that use it regularly from low-connectivity regions.

    However, the TEK system (which I'm involved in) offers several benefits over a purely email-based solution. By having a web proxy on the client side, users can use their favorite browser to view downloaded pages, complete with color and formatting, which is often absent in text-only systems. Moreover, the client keeps a local, searchable cache of all downloaded pages, and the server keeps track of which pages have been sent to avoid wasting bandwidth on duplicate content. Finally, with a web-like user interface, many users can share a single e-mail account in a public kiosk or school.

    Many more details about the TEK system are available from the TEK Homepage [sourceforge.net]

If you have a procedure with 10 parameters, you probably missed some.

Working...