Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
The Internet

Freecache 258

Posted by michael
from the building-a-better-mousetrap dept.
TonkaTown writes "Finally the solution for slashdotting, or just the poor man's Akamai? Freecache from the Internet Archive aims to bring easy to use distributed web caching to everyone. If you've a file that you think will be popular, but far too popular for your isp's bandwidth limits, you can just serve it as http://freecache.org/http://your.site/yourfile instead of the traditional http://your.site/yourfile and Freecache will do all the heavy lifting for you. Plus your users get the advantage of swiftly pulling the file from a nearby cache rather than it creeping off your overloaded webserver."
This discussion has been archived. No new comments can be posted.

Freecache

Comments Filter:
  • by attaboy (689931) * on Wednesday May 12, 2004 @12:34PM (#9129036)

    Well, it won't be the solution to Slashdotting, as you can't cache a whole site.

    Please note that you cannot submit a whole site to FreeCache as in http://freecache.org/http://www.rocklobsters.com/ This will not work as only index.html will be cached. You have to prefix every item that you want to have cached seperately.

    You can cache an HTML page (index.html) but all the images will pull from the local machine. You could cache each image separately, but the change would have to be made in the site's HTML.

    On the other hand, I don't imagine it would be hard to write some kind of proxy script that grabs the page and changes the HTML to point to freecache SRCs for each image/movie... you could then point to a freecache of that page...

    And of course, this all breaks the second somebody has a site that is heavily CGI based.

    Still, it's a start. I'll be sure to use it if I ever submit any site of my own to Slashdot ;-) Many thanks to the guys at the Internet Archive for setting this up. You rock!


    • Some questions (Score:5, Interesting)

      by GillBates0 (664202) on Wednesday May 12, 2004 @12:42PM (#9129172) Homepage Journal
      Definitely not an adequate solution, given it's current condition: slashdotted to hell.

      I have a few questions though, which I guess may be answered on the website:

      1. Can users submit/upload files to be hosted on their website.

      2. Who's responsible for ensuring that it doesn't turn into a pr0n/warez stash?

      3. Can users request removal of cached content (something not possible with the Google cache).

      • Re:Some questions (Score:5, Informative)

        by Phoenix-kun (458418) * on Wednesday May 12, 2004 @01:30PM (#9129907) Homepage
        3. Can users request removal of cached content (something not possible with the Google cache).

        Actually, you can request removal of a google cache [google.com], but you must have access to the reference source site to do so. Once you've requested removal, there is even a personalized status page where you can check the progress of the removal.
      • by Anonymous Coward

        Definitely not an adequate solution, given it's current condition: slashdotted to hell.

        Idiots! They should've had it cache itself first before posting this to /.

    • by dan_sdot (721837) * on Wednesday May 12, 2004 @12:43PM (#9129191)
      Yes, but the thing that you are not considering is that probably 75% the slashdot effect is just people looking at the link for about 5 seconds, and then closing the page and moving on the the next story. This means no browsing, meaning that it is not important if the whole page is not up there. And as far as pictures go, I would guess that alot of people click on the link, even though they are not too interested, see the text, and realize that they are _really_ not interested. So they close the page before they even need pictures.

      In other words, the important stuff, like the rest of the site and the pictures, will be resources only used on those that really care, while those that don't get to see a flash of the text for a second to get a really general idea.

      After all, thats what the slashdot effect is, a whole bunch of people that don't really care that much, but want a quick, 5 second look at it.
    • by Anonymous Coward on Wednesday May 12, 2004 @12:44PM (#9129217)
      I think they're looking more for serving big files, not html and inline images. Smallest file size is 5mb.

    • I should clarify that I mean this will not be the solution to the effect caused by "surprise-slashdotting" where the site owners are not notified ahead of time.

      If a savvy site owner is notified by slashdot editors before being listed, they might be able to take some preventive action.

      I don't think that currently happens very often, though.
    • by lxdbxr (655786) on Wednesday May 12, 2004 @12:47PM (#9129257) Homepage
      Also only works for large files unless this FAQ [archive.org] is out of date:
      What files are being served by FreeCache?

      FreeCache can only serve files that are on a web site. If the link to a file on that web site goes away, so will the file in the FreeCaches. Also, there is a minimum size requirement. We don't bother with files smaller than 5MB, as the saved bandwidth does not outweight the protocol overhead in those cases.

    • The thing is supposed to be used to cache large files. It has not been build to cache web pages.


      Maybe, when the site is no longer slashdotted, people will be able to get a look at their FAQ and see that (the submitter should have done that instead of submitting stuff that he just discovered, even if it has been availlable for a long time, without even checking for it's real purpose).

    • Alternative solution (Score:4, Interesting)

      by Ryvar (122400) on Wednesday May 12, 2004 @12:50PM (#9129307) Homepage
      Create a file format that is basically just the web page plus dependent files tar'd and gzip'd - then release browser plugins that automatically take any file with the correct extention, and seamlessly ungzip/untar it to the local cache before displaying it like normal - I have yet to understand why nobody has combined this basic idea with BitTorrent. Seems like you could get a lot of mileage with it.
      • by CerebusUS (21051)
        Well, Microsoft created a format that doesn't require tar and zip. .mht files are complete webpages (images and all) bundled up at once so you can deliver them as a single file.

        Combining that with bittorrent should be relatively easy.

        Of course, you'll probably have to view the result in IE, as the mozilla project hasn't quite worked out .mht yet, I don't think.
      • JS and CSS files can be easily included, and for images, I think Mozilla has some sort of support for this. The only info I can find about it is here [php.net], where you basically have an IMG tag like this:

        <IMG SRC="data:image/png;base64,[the image data in base64]">

        So instead of a URL to the image, it has the image data directly in the IMG tag.
        Someone probably only has to write some JS code in Mozilla to join all the features together, the question if if JS can then pop-up the "Save As" dialog box, I think n

      • Konqueror allows users to save a page and it's dependencies in a Web ARchive. It's pretty much a .tgz file renamed.
      • Mozilla can already browse through tars in that manner.
      • by Coulson (146956) on Wednesday May 12, 2004 @02:18PM (#9130589) Homepage
        The problem is that you don't get any benefit from reduplication. Many pages share the same images; if each file is requested independently, the client can ignore files that are already in the cache. If you have to download a tarball of each page + images, you don't get any savings from images already in cache.

        You'd have to come up with a scheme like:
        1. send a request + list of files you have from that domain + timestamps [large!]
        2. server sends diff tarball
        3. client unzips to cache and displays
        ...or...
        1. send request
        2. server sends single response + list of related files + timestamps
        3. client diffs with cache, sends back batch request for related files
        4. server sends back batch tarball
        5. client unzips to cache and displays

    • Actually, index.html would only be cached if it is 5MB or greater in size.

      Which is unlikely. So it won't be cached. Nor will the PNG/GIFs.

      Ratboy
    • Just zip up a tarball/archive of your site and submit that to slashdot.
    • Freecache simply caches any data that goes through it, no matter what it is. You don't have to precache anything; the first request, freecache goes and gets it, the second request is served from cache.

      relative addressed image URLs would apply, as the local browser would pull them from a URL based on the URL the current page lives at.
  • How many times have we wanted to see a website, except that it has been slashdotted and cannot see us now?

    Caching is intelligent because we are interested in the content itself rather than the connection to that particular computer.

    • Re:Smart! (Score:3, Funny)

      by ozric99 (162412)
      How many times have we wanted to see a website, except that it has been slashdotted and cannot see us now?

      Let me guess.. you're posting this from soviet russia?

  • by akedia (665196) on Wednesday May 12, 2004 @12:35PM (#9129047)
    In case of Slashdotting, here's a Freecache link. [freecache.org]
  • by j0keralpha (713423) * on Wednesday May 12, 2004 @12:35PM (#9129050)
    http://www.archive.org/iathreads/post-view.php?id= 8764

    He was apparently /.'d... and hes apologizing for the load.
  • by RobertB-DC (622190) * on Wednesday May 12, 2004 @12:35PM (#9129058) Homepage Journal
    As I understand the setup, the ideal would be for ISPs to install this system on their networks like AOL's infernal content caching, except that it would only cache what the site owner wants cached. It seems like anyone with a static IP could join in the fun, too.

    But would they? I saw this on the new service's message forum [archive.org]
    I was perusing the content in my cache and checking the detailed status page and I noticed illegal content containing videos in one of the caches I run. What is freecache.org doing to stop people from mirroring illegal content. I currently run 2 fairly heavily used caches and it looks like only one of them had illegal content. I cleared the cache to purge the problem, but the user just abused the service again by uploading the content again. I know freecache.org cannot be responsible for uploaded content, but there has to be some sort of content management system to make sure freecache doesn't turn into just another way to hide illegal content.

    Whether you believe this guy's story [slashdot.org] or not, it seems like this could subject small ISPs to the sort of problems that P2P has brought to regular users. It's not going to matter who's right -- just the idea of having to go to court over content physically residing on your server is a risk I don't see a marginal ISP being willing to take.

    So we're left with the folks with static IP addresses. They're in even more trouble if John Ashcroft decides to send his boyz over to check for "enemy combatants" at your IP address.

    With the current state of affairs in the US, and the personal risk involved, I'd have to pass on this cool concept.
    • by RAMMS+EIN (578166) on Wednesday May 12, 2004 @01:40PM (#9130053) Homepage Journal
      I wonder why this continues to be a problem. It should be obvious to any judge that a hosting provider cannot and should not check everything that is uploaded to their servers.

      It may be reasonable to expect them to pull content that is illegal where they are located, but that should be a simple matter of notifying them, they pull the content, no harm done. They may even be required to disclose the identity of the uploader, after which this person can be prosecuted.

      I don't think anything in this scenario is outrageous or unfeasible. What is outrageous and infeasible is holding the host responsible for what the user uploaded. Then why is this the way it happens all too often?
  • by Rosco P. Coltrane (209368) on Wednesday May 12, 2004 @12:36PM (#9129064)
    http://freecache.org/http://your.site/yourfile

    http://freecache.org/http://freecache.org/http://f reecache.org [freecache.org]

    seems to piss it off slightly. I wonder why...
    • http://freecache.org/ http://freecache.org/ http://freecache.org

      I'm sure he would have made a deeper recursion, but the Slashdot lameness filter was able to compress it too efficiently.

  • by Seoulstriker (748895) on Wednesday May 12, 2004 @12:36PM (#9129071)
    1. Buy massive amounts of bandwidth
    2. Host extremely popular web sites
    3. ???
    4. PROFIT!!!

    How are they supposed to be making money on this?
  • Or use Google... (Score:4, Informative)

    by StevenMaurer (115071) on Wednesday May 12, 2004 @12:36PM (#9129073) Homepage
    If the referrer is slashdot, return a link to the google cache of your page element, rather than the actual element.

    I trust google to be faster than these guys.

    • by hendridm (302246) on Wednesday May 12, 2004 @12:44PM (#9129206) Homepage
      Yeah, that's fine for sites who can expect the possibility of being linked to, but those sites can often handle the load anyway. It those small sites (Geocities) hosted on some guys cable modem describing how he modded his mom's vibrator into a CD player that won't make it. Often times, myself included, these people don't really think about or expect to be linked to.
    • by andycal (127447)
      The problem with that is that if it's new content google won't have it yet. Freecache could be a good way of surviving a /.ing , but the problem ( as with all caches) is that the server then doesn't get a accurate count of hits. This matters to some people, particularly people who advertise.

      The cool thing here is that you can say, "Cache just these things" and still have your server supply the html but not the images (or movies).

      But you still have to have a decent pipe.
  • Taking bets.... (Score:2, Interesting)

    by JoeLinux (20366)
    How much you wanna bet this is going to become a haven for bit-torrent seeds? Put 'em up, get 'em to people, get it started, then take 'em down.
    • Re:Taking bets.... (Score:3, Insightful)

      by wo1verin3 (473094)
      Taken from here [slashdot.org] but it answers your question. If the person seeding removes the file, it would disapear in the cache as well. Maybe they check the original file link still exists and functions every few hits to the cache?

      Also only works for large files unless this FAQ [archive.org] is out of date:

      What files are being served by FreeCache?
      FreeCache can only serve files that are on a web site. If the link to a file on that web site goes away, so will the file in the FreeCaches. Also, there is a minimum size requirement
  • by Comsn (686413) on Wednesday May 12, 2004 @12:38PM (#9129107)
    its pretty good. lots of the servers are swamped tho, need more of them, anyone can run a freecache 'node'. its almost like freenet, cept not anonymous.

    too bad the status seems to be down, its fun to see what clips/games/demo/patches are going around.
  • If i put the page up does Freecache have to wait until Internet Archive caches it? or does it nab a copy of the cache right away...

    if it does then I propose that all posts of smaller sites hence forth should be freecached.

    anyone wanna second it? not that it will do any good.
    • Yes, it caches immediately, the first time someone request are URL thru freecache, the content of that URL will be cached in real-time while it's being streamed to you. In fact, you do not even need to tell anyone about the caching has it will be done for any resources prepended with the freecache URL.

      However, it will not cache resources that are under 5M. The cache is designed to cache for large piece of content.
  • by Albanach (527650) on Wednesday May 12, 2004 @12:39PM (#9129122) Homepage
    on slashdot [google.com] - lots of times. It only cache's files bigger than 5MB so if someone is slashdotting your MP3 collection it's a boon. If you're jsut hosting a dynamic web page with dynamic images your mysql server is still going to feel the strain.
  • /.ed already... (Score:3, Insightful)

    by warpSpeed (67927) <slashdot@fredcom.com> on Wednesday May 12, 2004 @12:39PM (#9129128) Homepage Journal
    This does not bode will for a caching site that will supposidly help with the /. effect...

  • by Mr_Silver (213637) on Wednesday May 12, 2004 @12:40PM (#9129141)
    1. Does that mean that Slashdot will now link to potentially low-bandwidth sites using Freecache?
    2. Will you update their FAQ on the whole subject of caching since Google and Freecache seem to feel that the legalities of site caching is small enough for it to be a non-issue?
    3. Or are we still going to be relying on people posting links and site content in the comments because the original site has been blown away under the load?
    Inquiring minds would like to know.
    • What you are proposing wont work. Only the original linked file (or implied index.?) will be cached. In order for the bulk of the content to be cached, the site owner would have to change all internal links to point to freecache.

      The working solution would be for the slashdot editors to give a site owner a heads-up so that they can prepare for the flood.
    • I think I can answer all three questions, even tho I'm not related in any way to the "Slashdot owners"... Unless someone submits patches to slashcode.org that includes auto-Freecache'ing and it gets accepted as part of the base code used for this site you will not see /. change the way it handles the /. effect for any of your three points. I'm not saying that's what I myself think is "right" or "ideal" but it's the most likely scenario I see.

      Reasoning basically stands as follows: "they" would most likely
  • Beta! (Score:5, Informative)

    by dacarr (562277) on Wednesday May 12, 2004 @12:41PM (#9129151) Homepage Journal
    I should point out that Freecache is in beta mode. By coincidence, this posting on Slashdot here is an interesting way of working out bugs.
  • Slashdotted (Score:4, Funny)

    by skinfitz (564041) on Wednesday May 12, 2004 @12:41PM (#9129153) Journal
    The demo seems to be down. [richstevens.com]

    Oh crap that was the wrong link - try this:

    http://freecache.org/http://movies03.archive.org/0 /movies/LuckyStr1948_2/LuckyStr1948_2.mpg [freecache.org]
  • Slashdot cache (Score:2, Redundant)

    by aeiz (627513)
    Slashdot should have their own caching system that automatically creates a cache of whatever website is being posted.
  • ... execpt (Score:5, Informative)

    by laursen (36210) <laursen@@@netgroup...dk> on Wednesday May 12, 2004 @12:42PM (#9129168) Homepage
    They have been offline [archive.org] for AGES due to abuse ...

    As their status page [archive.org] explains...

  • What happens if you use archive.org's own Wayback Machine as a cache? Instead of linking that hugely popular Slashdot story to someone's relevant actual Geocities site, you link to a 12-day-old copy of that Geocities site in archive.org? Does archive.org get slashdotted easily?
  • by ACNeal (595975) on Wednesday May 12, 2004 @12:44PM (#9129214)
    I see dreaded pictures from goatse.cx in the future. This will break the nice convenient domain name clues that Slashdot gives us, so we don't accidently do things like that.

  • Did anyone else misread that as "Freeache"?

    I mean, I'm all for free stuff, but an ache...?
  • http://www.archive.org/, which used to have a one or two second response time, now is taking over a minute to return their home page.

    I do not think this is a solution to slashdotting :-)

  • by Doc Ruby (173196) on Wednesday May 12, 2004 @12:51PM (#9129324) Homepage Journal
    This use of Freecache is still subject to the actual problem that enables Slashdotting: inadequate scaling planning. Some sites are limited by the cost of effective scaling failover countermeasures, but most are limited by lack of any planning for even potential Slashdotting - this use of Freecache still falls prey to that primary problem. And who can remember to prepend "http://freecache.org/" to their entire domain URL, including their repetitive "http://"?

    A better use of Freecache is "under the hood". Make your webserver redirect accesses to your "http://whatever.com/something" to "http://freecache.org/http://whatever.com/somethin g". More sites will be able to plan for that single change to their webserver config, than will be able to plan to distribute the freecache.org compound URL. And it won't depend on users correctly using the compound URL. More sites will get the benefit of the freecache.org service. And when freecache.org disappears, or ceases to be free, switching to a competitor will be as easy as changing the config, rather than redistributing a new URL.
    • "A better use of Freecache is "under the hood". Make your webserver redirect accesses to your "http://whatever.com/something" to "http://freecache.org/http://whatever.com/somethi n g". More sites will be able to plan for that single change to their webserver config, than will be able to plan to distribute the freecache.org compound URL. And it won't depend on users correctly using the compound URL. More sites will get the benefit of the freecache.org service."

      I believe that is the idea - not for users to
    • what apache/mozilla needs is a plugin to automatically bittorrent-style distribute the load across everyone who is currently requesting a file.
  • I could see people using this to start their own pr0n sites.

    Perhaps there should be an alternative to scientific projects and OSS projects.

    freecache.fsf.org perhaps?
  • So now instead of /.ing the website we /. the Internet Archive instead.
  • Finally the solution for slashdotting ...

    Not really.. I can't access their servers now. All will tremble before the might of slashdotting!

  • by Xiadix (159305) on Wednesday May 12, 2004 @12:54PM (#9129374) Homepage Journal
    That is another stumbling block that will prevent it from saving may websites. If I can't use the freecache link, I will be forced to go back to the orginal link...as will a good percentage of the other /. crowd.

    KevG
  • Ironic (Score:5, Funny)

    by osjedi (9084) on Wednesday May 12, 2004 @12:57PM (#9129418)


    Story is only a few minutes old and mecca of Internet caching has already been slashdotted. Maybe someone kid with an old P5 266mhz under his desk can mirror the site for us.

  • if freecache can't even handle the slashdot effect, what does that say about its advertised service :) ?
  • by Xiadix (159305) on Wednesday May 12, 2004 @01:00PM (#9129462) Homepage Journal
    Is a public available squid server. If you put any link through the server such as:

    www.squidserver.com/http://www.doomedsite.com

    The public squid will cache a copy of it. On the first access (like when the approver looks at it) It should look at a request and see if it has a recent cache. If it does feed that, if not get the newest copy and promth the user for a refresh or automatically refresh after a set time (5 sec). It will update its cache as the site does. All without having to upload anything. After a few days when nobody is utilizing the cache, it can purge it. Waiting for the next doomed site.

    DISCLAIMER: The may be how Freecache works, but I can't get to it
    1) because I am at work.
    2) as the comments suggest it is slashdotted.

    KevG
  • I looked around the site and didn't see an answer to this question:

    How does this system guard against doctored content coming from the cache sites? Since they allow sites to sign up to become a cache server, wouldn't it be possible for a malicious user to sign up and use some locally-modified code to add a virus to all the .exe files that get sent out from their cache? They could even customize the output of their CGI depending on what domain you are in, making it easy to target specific sites and/or hi

  • by curator_thew (778098) on Wednesday May 12, 2004 @01:07PM (#9129566)

    Freecache is really just a half-baked ("precursor") version of P2P; not in any sense a long term solution, but interesting at least.

    Correct use of P2P with network based caches (i.e., your ISP installs content caching throughout the network) and improved higher level protocols (i.e. web browsing actually runs across P2P protocols) would resolve slashdot effect type problems and usher in an age of transparent, ubiquities, long-lived, replicated content.

    For example,

    Basically, your request (and thousands of other slashdot readers requests) would fetch "closer" copies of content rather than having to reach directly to the end server (because, the content request [i.e. HTTP GET] actually splays itself out from your local node to find local and simultaneous sources, etc]. In theory, the end server would only deliver up one copy into the local ISP's content cache for transparent world-wide replication, and each end point would gradually drag replicated copies closer - meaning that subsequent co-located requests ride upon the back of prior ones. I'm just repeating the economics of P2P here :-).

    In additional to all of this, you'd still have places like the Internet Archive, because they would be "tremendously sized" content caches that do their best to suck up and permanently retain everything, just like it does now.

    Physically locality would still be important: if I were a researcher doing mass data analysis / etc, then I'd be better of walking into the British Library and co-locating myself on high speed wi-fi or local gigabit (or whatever high speed standards we have in a couple of years time) to the archive rather than relying upon relatively slower broadband + WAN connections to my house or work place.

    For example, say I'm doing some research on a type of flying bird and want to extract, process and analyse audiovisual data - this might be a lot of data to analyse.

    Equally, places like the British Library will also have large clusters, so when I want in there to do this data analysis, I can make use of large scale co-located computing to help me with the task.

    Nothing here is now: if you think about it, these are logical extensions of existing concepts and facilities.

  • An error occurred while loading http://www.archive.org/web/freecache.php: Timeout on server Connection was to www.archive.org at port 80

    Somehow I don't think this solution will work.

  • by Russellkhan (570824) on Wednesday May 12, 2004 @01:48PM (#9130177)
    Yes, the site is down. Yes, it's ironic that this should happen to a site hosting information about a service that's being claimed as a solution to the slashdot effect.

    But I don't think that it really is an indicator. I happen to have read the site yesterday after reading the Petabox [slashdot.org] article, so I think I have some of the basic concepts down. As I understand it, the idea works with cooperation from ISPs (and others) to provide more localized caches of large popular files. The motivation for the ISPs is that by providing the cache, they save on their upstream bandwidth and the associated costs.

    So, while it's funny that we've slashdotted the archive.org server where the Freecache website is, Freecache itself is not dependant upon archive.org's bandwidth.

    It's also worth noting that the concept is still in beta and pretty new - I don't think they've got a lot of ISPs on board yet. From what I can tell, it seems a very good concept - the only thing I can think of that I would want to make sure of if I were an ISP is that my cache is only available to users on my network (the whole saving on bandwidth usage argument falls apart if you suddenly become a cache for users on other ISPs) but I would think that would be pretty easy to do.

    For those who haven't yet been able to read about it, here's Google's cache [google.com] of the front page.
  • Censored (Score:5, Interesting)

    by jdavidb (449077) on Wednesday May 12, 2004 @02:40PM (#9130931) Homepage Journal

    This would be great if my employer didn't restrict access to archive.org as allegedly being in the "sex" category.

  • by evilviper (135110) on Wednesday May 12, 2004 @05:18PM (#9133129) Journal
    The problem with non-comusator caching systems is that there is little if any incentive for the end user to want to use them.

    What ISPs should really do, is sell you a 256K internet connection (or whatever speed you happen to get), but then make all local content available at maximum line speeds... In other words, if you use the caching system (which saves the ISP money on the price of bandwidth) you get your files 6Xs as fast, or better in some cases.

    I don't see why ISPs don't do that. It seems like everyone would win then. It wouldn't just need to be huge files either, they could have a Squid cache too, and not force people to use it via transparent proxy (most people would actually want to use it, despite the problems with proxy caches).

    Right now, users have incentive not to use it. Mainly because it's another manual step for them, and to a less extent because caching systems usually have a few bugs to work out (stale files, incomplete files, etc).

    I know that it would only require minor modifications to current DSL/Cable ISP's systems to accomplish the two zones with different bandwidth.

No problem is so formidable that you can't just walk away from it. -- C. Schulz

Working...