Forgot your password?
typodupeerror

Startup Webaroo to put the 'Web on a Hard Drive'? 340

Posted by ScuttleMonkey
from the starting-the-data-storage-arms-race dept.
An anonymous reader writes "A new startup called Webaroo is launching Monday with an audacious proposition: You can search the Web without a net connection of any kind. Initial release consists of 'Web packs' on specific topics such as news, city guides or Wikipedia. Later this year they're promising a full-Web version that you can carry on a laptop -- provided you're willing to devote something in the neighborhood of 80 gig."
This discussion has been archived. No new comments can be posted.

Startup Webaroo to put the 'Web on a Hard Drive'?

Comments Filter:
  • by sentientbeing (688713) on Sunday April 09, 2006 @01:50PM (#15095310)
    I'm sold. Does anyone have the .torrent for it?
  • Dotcom v3.0 (Score:5, Insightful)

    by Saven Marek (739395) on Sunday April 09, 2006 @01:50PM (#15095311)
    A new startup called Webaroo is launching Monday with an audacious proposition: You can search the Web without a net connection of any kind.

    If anyone doubted the next dotcom boom is upon us, this should put that doubt to rest.
    • Re:Dotcom v3.0 (Score:5, Insightful)

      by caffeinemessiah (918089) on Sunday April 09, 2006 @01:58PM (#15095355) Journal
      I was JUST thinking that. This seems like the beginning of a whole slew of semi-ridiculous ideas that get funded because their proponents seem 'ahead of their time'. Did someone at a funding company not think of the following two points:

      1) the web is growing at a phenomenal rate. in a few years, the only thing that you'll be able to fit on even high-density media is very narrow, specific content. is there really such a huge market for that?

      2) wifi is nearly ubiquitous. why pay for a static snapshot of the web that will be obsolete in a few days when you can walk into a starbucks with you laptop and get the fresh stuff almost for free??

      I'm sure the guys who want to put the web on a disk have thought these points through, but me...I just really want to sigh. and buy some short-term stocks.

      • "in a few years, the only thing that you'll be able to fit on even high-density media is very narrow, specific content."

        Welcome to 10 years ago.
      • Re:Dotcom v3.0 (Score:2, Insightful)

        by tepples (727027)

        in a few years, the only thing that you'll be able to fit on even high-density media is very narrow, specific content.

        The thing is that Wikipedia, with all its imperfections and gaps, is still a surprisingly good start.

      • Re:Dotcom v3.0 (Score:5, Insightful)

        by Bogtha (906264) on Sunday April 09, 2006 @02:57PM (#15095593)

        wifi is nearly ubiquitous.

        I think you're way off on this one. On the other hand, I have a suitable substitute:

        2. What the hell are they going to do about the copyright issues?

        • Re:Dotcom v3.0 (Score:5, Insightful)

          by Xeriar (456730) on Sunday April 09, 2006 @03:28PM (#15095692) Homepage
          What the hell are they going to do about the copyright issues?

          Quoted for truth. I know I'm not the only one who thought "Hey, this would be cool... but the target websites are going to be pissed about losing their ad revenue."

          For sites like Wikipedia and others whose goal is the distribution of their content, this isn't as much of a big deal (unless, in the case of Wikipedia, they snapshot a vandalized site...), but a lot of content providers won't be happy about getting their ad revenue stolen.
      • Re:Dotcom v3.0 (Score:3, Insightful)

        Not to mention the copyright issues. I don't think many companies/individuals would want their websites being packaged and sold without their consent.
      • by flyingsquid (813711) on Sunday April 09, 2006 @05:02PM (#15096054)
        I was JUST thinking that. This seems like the beginning of a whole slew of semi-ridiculous ideas that get funded because their proponents seem 'ahead of their time'

        In related news, NewsCorp bought Myspace.com for 580 million.

      • Doesn't the wifi at Starbucks run somewhere in the range of $6 an hour? That seems hardly free to me.
    • Wait. There have already been 2 dotcom booms? I know there was one in the mid to late 90s. When was the other one?

      Plus, I hope you're right. I'm starting my graduate IT job in July. I'm gonna start earning loads of money! :)
    • Even if this is doable and legal, it runs entirely counter to the spirit of the Internet. The Internet on a hard disk is no longer a network, it becomes a passive entity with no possibility of interaction.

      At the moment, we are seeing a return to the interactive origins of the Internet, prime examples being blogging, Wikipedia, and even Slashdot! If this projects takes off it will be harmful to interaction and will turn the Net into a glorified television.

      However, I find it unlikely that Webaroo will g

    • Re:Dotcom v3.0 (Score:4, Interesting)

      by Philocke Fox (762396) on Sunday April 09, 2006 @03:26PM (#15095686)
      Robert X. Cringley had an article about this last year. http://www.pbs.org/cringely/pulpit/pulpit20050210. html [pbs.org]

      Basically what he said was that venture capitalists raised a whole bunch of money that they didn't spend during the last boom. This money is raised from investors and is given to the VCs for a limited time. The VCs make money from the management fees they collect for dealing with this money, usually 1 or 2% of the total amount. But, if they don't invest, then the money AND the fees get sent back to the original investors.

      The time limit on investment is usually about 10 years. So if we say that the boom started around '96, then some of these limits have already expired, and the rest of them will expire within the next 4 years.

      Use it or lose it. And the VCs will definitely use it.

    • There has been a large rise in start ups with hyped ideas (well at least if /. is your regular newsfeed) that starts to look a bit like another dot.bomb.

      Dot bombs are not about technically feasible ideas. They are not even about technology. They are all about putting together something that will appeal to venture capitalists. What really drove dot.bomb was that the VCs got into a feeding frenzy and all rational business plan/idea vetting went out of the window. For that to happen again means that a whole lo

  • by liliafan (454080) * on Sunday April 09, 2006 @01:50PM (#15095313) Homepage
    After reading the article, it sounds like they are just selling their web cache, nice idea but really unless they are selling really cheap I just can't see it picking up, especially considering the difficulties of getting the data to your drive, I mean an 80G download!

    Additionally what if I decide to follow site links that leave the cache?

    Yeah I can't really see this picking up.
    • Could just work... (Score:3, Interesting)

      by ELProphet (909179)
      But not in the way they think. TFA mentions two points, but doesn't explore them in depth. The first is their algorithms they use; let's face it, Google is starting to fall to the SEOs. If they have a new algorithm that was able to actually follow your web browsing all the way, they'd be able to provide much better results. Google claims to do this, but they can't follow you more than your first link. Second, they seem to pick up that most people find their entire information on the second or think link the
    • Well, for an additional fee you'll be able to get a "Webaroo Subscription", which will allow you to connect to the internet and download additional content. I'm sure that this, combined with an optional subscription for real-time content-updates will make this product a smashing success.
    • by twitter (104583) on Sunday April 09, 2006 @02:37PM (#15095542) Homepage Journal
      The wayback machine's [archive.org] terrabytes [archive.org] of data is what this really takes. Keeping it up to date is another story.

      Archives are good and this can be a useful service. Providing 80 select gigs on a hard drive to libraries and schools is a useful until US networks get where they should be. Their software can keep those 80 GB up to snuff at night. When you leave the cache, you ... gasp ... get the new content. In the mean time, things are much faster when it matters. Mirrored content will always be a good idea. Look at the debian distribution system, for example.

      Good luck to the people at Webaroo. So long as they don't apply for stupid patents that give them an exclusive franchise to distribution systems, they are AOK.

      The road warrior thing will flop, though. People are going to stay where there's a network or pay the $10. It's the one piece of live information that requires the hook up. The speed of the rest is gravy for those people.

    • From TFA...

      The company and service officially emerge from behind their stealth shield tomorrow armed with a flashy bundling agreement from laptop maker Acer.

      Most likely, the reason behind this awesomely silly "feature" is getting people to pay more for laptops with larger hard drives, with marketing promising "search the web without an internet connection!"

      And, of course, selling a subscription service that lets you download updates of your favorite internet content to your laptop... a technology form

  • by minus_273 (174041) <aaaaa@@@SPAM...yahoo...com> on Sunday April 09, 2006 @01:50PM (#15095315) Journal
    when someone asked if the internet will fit on a floppy?
  • How soon till the first lawsuit is filed.
    • by tepples (727027) <tepples@nOSpAM.gmail.com> on Sunday April 09, 2006 @02:26PM (#15095485) Homepage Journal

      How soon till the first lawsuit is filed.

      US copyright law, 17 USC 512 [cornell.edu], excuses operators of automated caches that conform to established cache control protocols (meta elements, /robots.txt, etc.) from copyright infringement liability.

      • US copyright law, 17 USC 512, excuses operators of automated caches that conform to established cache control protocols (meta elements, /robots.txt, etc.) from copyright infringement liability.

        Webaroo has gone far beyond being a cache, they are aggregating others content into a downloadable product they sell for money. This is no different than Napster 1.0 or a $10 per download warez site - the key difference being this is web content.
  • Is this really the right to to try this? when wi-fi connections are popping up all over the place and the internet's bigger than it ever has been before?
    • it would always be the right time, if only for the possibility that the bomb drops and we have to live a mad max style existance scavenging and fighting over laptop batteries and petrol in old stores throughout the land. If that happens, i wanna be able to read uncyclopedia [uncyclopedia.org] at the end of the day.
      If it didn't happen i would be like the guy who loses his glasses in that old story and can't read even though he has eternity "but there was time now..." or whatever.
    • Not just access (Score:3, Interesting)

      by David Hume (200499)
      FTFA:

      Which isn't to say that ever more ubiquitous 'Net connections won't pose a challenge to the Webaroo business model.

      "Long-term their opportunity may have more to do with [search] performance" than the offline capability itself, Enderle says.

      Husick tells me that performance benefit was reinforced for the company by a rousing reception their service received from Japanese mobile operators who he says were salivating over Webaroo as a means to siphon search traffic away from their increasingly crowded wire

      • Like most lawyers I know, you seem to miss the possibility for more than one option to exist in the world. (That's why you guys make such great politicians.) But the world of engineering is about increasing the number of possibilities, quite unlike the zero sum game from which most lawyers skim off the top. It's quite reasonable to suppose that there are times when having a laptop that has both wireless connectivity AND a static snapshot of the more useful parts of the net would be fantastic. For example, m
  • ownership (Score:3, Interesting)

    by xzvf (924443) on Sunday April 09, 2006 @01:52PM (#15095321)
    Wouldn't there be an issue here of selling another person's content? While everyone can view the content at will, copying that information to media and then reselling it, or even distributing it for free, would be an issue.
    • Well, that gets back to the whole issue of who, exactly, has jurisdiction of what parts of the Internet, and where, and when, and under what circumstances. And of course, where you choose to sell your "product". Definitely a can of worms. I hope they have a good legal department, because I think they're probably going to need it.
    • How is it any different to my ISP charging me to look at other people's content? The only difference is that one is online, the other is off line. Should that really make a difference, I'm sure the lawyers will argue that it does, but in reality there is little.
    • When you do it -- thats piracy. When a company does it, thats just business.
    • Why not have a web spider working in the background, copying files from the browser's web cache, following links on these documents, etc? This way one is likely to have a great deal of information available for searches, and it would be an automatic cache built by the user, not distributed from a vendor.
  • Copyright? (Score:2, Insightful)

    by MustardMan (52102)
    Considering the fact that companies are suing google for putting the first paragraph of their news tidbits on google news, how long will it be before someone sues webaroo for copyright infringement? Whether the claim is valid or reasonable or not is a moot point - someone is gonna see this as infringement and call out their pack of rabid lawyers.
    • Well, since this is a start up they're not going to have very deep pockets, so unless someone is truly disturbed about copyright infringement I doubt you'll see too much legal action right away. No money in it. And I would expect that if anyone did complain Webaroo would immediately remove the offending content from future versions: they'd be fools to do otherwise. However, if (by some amazing happenstance) this becomes popular and profitable, expect multiple packs of hungry, rabid lawyers to move in for th
      • Doesn't matter if they remove it. IANAL, but if they put a large number of peoples content (as opposed to small snippets which can be defensible) on a CD and distribute it without verifying either that the copyrightholder has granted a license for it to be used that way, or contacting the copyrightholder to get a license, it is a clear case of copyright infringement and there's no way they'd be able to get a judge to believe it wasn't willful.

        The combination of willful copyright infringement and a profit

  • look at news without a net connection? Either this is going to be just the same as viewing pages offline after you've been on them (perhaps an automated web crawler which grabs pages whilst you have some up time) or you will be viewing very old news... It seems to be the former though, in which case your not really doing it "without a connection"... so why bother? this seems like a waste of space and time (an bandwidth), just look at what you want to when your plugged in rather than constantly getting infor
    • I've heard that a technology for seeing news offline already exists. It's very cheap, disposable (so don't worry if you leave it on the train) and can even keep you dry for a short period of time if it starts to rain. What's more - it's made from trees! How clever is that!!!
  • by KenDodd (961972) on Sunday April 09, 2006 @01:55PM (#15095341) Homepage
    For example, where do we get the porn diffs?
  • by RobotRunAmok (595286) on Sunday April 09, 2006 @01:56PM (#15095346)
    Been around since the early 90's. Back then it was called "fan fiction."
  • by Idimmu Xul (204345) on Sunday April 09, 2006 @01:57PM (#15095351) Homepage Journal
    e.g. searching? Having Wikipedia on your hdd is all well and good, but if you can't easily search it, what's the point?
  • by mtrisk (770081) on Sunday April 09, 2006 @01:59PM (#15095358) Journal
    They should be selling their compression technology!
  • 80 gig web? (Score:3, Insightful)

    by hlh_nospam (178327) <concealedhandgun@gmai l . c om> on Sunday April 09, 2006 @02:04PM (#15095380) Homepage Journal
    That would cover about 0.0000000001% of the web, give or take a few dozen orders of magitude.
    • So that would be between 10^-34 and 10^14 percent?
      • Actually between 10^-10 and zero percent. Probably much closer to zero. But nobody really knows just how big the WWW is, or even how fast it's growing. Even the mighty Google doesn't index more than a small percentage.
    • It's more like 0.15%, if they use the same compression and content selection criteria as the Internet Archive. If they eschewed with all non-html content (graphics files, pdf's, etc) that would go up quite a bit. If they used better compression (the Archive uses gzip) it would go up some more.

      An average crawl of the public web, minus files which are "too large" (not sure what the threshold is for that), makes about 55TB of gzipped archive. 80GB / 55000GB = 0.0014545, or about 0.15%.

      -- TTK

  • by omeg (907329) on Sunday April 09, 2006 @02:11PM (#15095414)
    "The Internet Archive Wayback Machine contains approximately 1 petabyte of data and is currently growing at a rate of 20 terabytes per month. This eclipses the amount of text contained in the world's largest libraries, including the Library of Congress. If you tried to place the entire contents of the archive onto floppy disks (we don't recommend this!) and laid them end to end, it would stretch from New York, past Los Angeles, and halfway to Hawaii."

    Internet Archive Frequently Asked Questions [archive.org]
  • How big is Google's index of the Web, complete with URLs of results? I could search that, only a day out of date, without a Net connection, if it fit on a HD. Maybe using Usenet to distribute it...
    • Google's indexes probably run to many terabytes. Google indexes roughly a billion pages. If each page has a thousand words, and each word can be reduced to a single 32-bit number in the index, that comes to 4 terabytes. And it's probably much, much higher than that; this is a back-of-the-envelope calculation.

      Especially since there's considerable redundancy; they can't search all that data that quickly without throwing multiple computers at it. Even if you could have a local Google copy, it would run very sl
  • Pr0n? (Score:4, Funny)

    by Dante Shamest (813622) on Sunday April 09, 2006 @02:14PM (#15095432)
    Would the downloadable content include porn?

    Er, I'm asking this in order to, er, protect my girlfriend's sensibilities. Can't have her unwittingly downloading such naughty stuff you know. =)
    • Re:Pr0n? (Score:2, Insightful)

      by RoloDMonkey (605266)
      if(posts_to_slashdot && has_girlfriend)
        if(girlfriend.has_sensibilities)
          chance_of_lying = VERY_HIGH;
        else
          chance of lying = HIGH;
  • I see issues of copyright coming up. Just linking to sites these days can get people into trouble, what will be the repercussions of essentially taking all this data and stuffing it on someones hard drive.
  • I missed that eBay auction deadline again! I'd better start using FedEx for the new versions.
  • by tyroneking (258793) on Sunday April 09, 2006 @02:22PM (#15095466)
    From the website "Webaroo is a stealth-mode technology startup" which obviously means something very clever ... personally I use WinHTTrack on a small number of sites, now if someone offered pre-downloaded WinHTTrack sites ...maybe to order ...
    Anyway, more importantly - Dr Who is due back on UK TV soon I think (slightly disappointing end to last series - shame to to see Chris E leave) so here's a joke that Webaroo might like to to 'cache' ... "What do Daleks have for a snack? ...
    Dalek bread..." geddit? (thanks to a kids radio show for that one).
  • could give me Duke Nukem Forever or the next Amiga OS release.
  • by Glowing Fish (155236) on Sunday April 09, 2006 @02:30PM (#15095499) Homepage
    This actually isn't by any means a new idea.

    If you've ever written or read html, you know that html doesn't care if links start file:// or if they start html://. HTML has always been quite neutral on whether it was linking to a local file system or getting something over the internet. Of course, most people don't use html extensively for local content. So in theory, this isn't a new idea at all.

    In practice, I don't see a lot of points for it. I can imagine that some people might want a map of a new city, with clickable pictures and informations about various services there. Most features of a city map are going to stay the same for at least six months, so this is the type of thing that could be done staticly. But even with this, internet access is so widespread, that it seems like a solution for a minor problem. Also, if you want a handy city guide, it would make more sense to me to write it from scratch rather than use a cludge of cached web pages.
  • It's an offline, indexed database; interesting but hardly newsworthy. So unless they've broken the Shannon limit there's nothing more here than IPO fodder.
  • by sunwolf (853208) on Sunday April 09, 2006 @02:35PM (#15095521)
    How are they to justify selling other peoples' websites? What about the sites' lost ad revenues?
  • Now I can say that I've finished downloading all the intrawebs!
  • by Finni (23475)
    I'm surprised no one has mentioned the word 'aleph' yet.
  • by ecloud (3022)
    Lemme guess, they're going to do that with SQL on Rails [slashdot.org]. (If you didn't see the screencast, that's part of their April 1 demo - they did a SQL query on "the internet", and claimed to have downloaded the whole internet into tables beforehand.)
  • A boss I once had while working on a NSF grant funded project a handfull of years ago held a meeting his first week on the job. This is his actual quote: "I'm not very good at searching the internet, can one of you put it on a CD for me?" Followed by everyone else in the meeting promptly walking out of the room shaking our heads.

    This project was a highschool biology series of CD-ROMs, which used html/javascript on a CD (worked in all browsers, all platforms). It was a great project, except that moron gave

  • feh (Score:4, Interesting)

    by andreyw (798182) on Sunday April 09, 2006 @03:27PM (#15095687) Homepage
    Frankly, I could see a market for this *maybe* 10-12 years ago. It just doesn't make any sense now. The internet is not solely about static content. Also, the thimble of data provided in each pack will be underwhelming and perpetually out of date.

    I mean, if I know I won't be online for a week, what stops me from just CURLing or WGETing whatever I plan on reading for the next couple of weeks? And that goes only for static content like books and articles. Everything else is cannot be simply cached.
  • Look at it from an alternate perspecitive ...

    For most of North America, where high speed is fairly common and unmetered, this is not a good idea.

    For some other parts of the world, the internet is only available in dialup, and is metered. Spending hours surfing can be very cost prohibitive.

    So, if large parts of the net is available offline, I can see a market for those geographical areas, provided the cost is not prohibitive ...

  • Well, Wikipedia is licensed under the GFDL so there has never been any problem downloading [wikimedia.org] the database for it. There are even many different versions for mobile platforms and XP [infodisiac.com] (including search functionality). And the ipod [sourceforge.net] of course.
  • 80GB, huh. What's that? Two, dual-layer BluRay discs. Might make a great case for the next DVD technology.
  • Las time I passed thru customs in London, they asked about the laptop and "do I have the Internet on there". I told him "no" but now, thanks to these dweebs, I'll have to say "Yes, I have the Internet on my laptop."

    Bastards.

      -Charles

Truly simple systems... require infinite testing. -- Norman Augustine

Working...