Forgot your password?
typodupeerror
The Internet

Shirky On Umbrellas, Taxis And Distributed Systems 40

Posted by Hemos
from the itneresting-approach dept.
There's a good article from Clay Shirky talking about the similarities between umbrellas, taxis and distributed computing. And if you really want more P2P than you can shake a fork at, the folks at ORA have also released an excerpt from the upcoming Dornfest and Brickley book.
This discussion has been archived. No new comments can be posted.

Shirky On Umbrellas, Taxis and Distributed Systems

Comments Filter:
  • by Anonymous Coward
    Modern world. [24.12.110.66]
  • by Anonymous Coward
    Distributed computing allows finer weather predictions, which in turn tells you when you won't need to bring an umbrella, hence saving the energy it takes to carry an umbrella. Unfortunately, all these distributed computations suck enormous power from the power grid, hence the need for taxis when the subway is out of power.
  • Well, you have a point, although I don't think it would be possible to keep the aggregated information a secret if you really wanted to.

    --

  • by Sanity (1431) on Sunday January 21, 2001 @11:59AM (#492221) Homepage Journal
    My question with the whole distributed computation branch of the P2P bandwagon has always been one of "where are the applications?". The criteria for which applications would be appropriate for this seem to be rather limiting - these criteria are as follows:

    Firstly, the algorithm must be parallelizable. This means that it should be possible to split an algorithm which normally takes N time, across a number of, say P processors, and have it take less than N time, and ideally N/P time.

    Secondly, the algorithm must have minimal communication requirements. Rendering, for example, is parallelizable, however in most modern rendering applications each computer would need an entire description of the scene being rendered. This could be a huge amount of information, running into gigabytes, yet it would need to be distributed to every participant in the rendering process. Recall that in most distributed computation applications connectivity will be limited to a 56k modem which is only connected to the Internet intermittently. Even if you limit users to broadband, communication bandwidth is still a problem.

    Thirdly, the algorithm must be robust, if someone decides to screw things up, and hack their client to send back malicious data (as happened with Seti@home) they must not be able to invalidate the work that everyone else has done. Ideally there would be an easy way to validate the work done by each client in the system.

    Now, I am not saying that there are no applications which do not conform to these criteria, for example, cracking crypto algorithms and processing information from space telescopes in search of intelligent life clearly work quite well - however neither of them can really be used to make vast amounts of money. The only other thing I can think of are genetic algorithms, but again, whether there is a revenue stream there is an important question.

    Perhaps some of these distributed computation people have found a killer application for this technology, some of them certainly claim that they have, but I really wonder whether such applications will stand up to scrutiny on the grounds I outline above.

    --

  • I'm presenting a talk at the O'Reilly p2p conference [oreilly.com] entitled "Attack Resistant Sharing of Metadata".

    It's based on an idea of Raph Levien's [advogato.org], somewhat similar to the Advogato trust metric [advogato.org]. Basically, you only trust meta-data from your friends, or from people whose meta-data has been good in the past, and then to a lesser extent you trust their friends, but you dynamically adapt if someone starts distributing bad meta-data. We can't really prove that it will work, but it has some promising characteristics.

    We are going to implement it on top of Mojo Nation [mojonation.net].

    Regards,

    Zooko

  • Fair cop, and I actually had a couple of paras in there to go a little further down that road, but I stopped because I was really doing 'order of magnitude' calculations.

    For starters, the numbers were chosen to make the math come out nicely -- the average box less monitor is actually less than a grand, and the average machine is in service longer than 20,000 hours, which makes the nickel figure high.

    Furthermore, if PopPower et al wanted to build a cycle farm, they'd use multi-CPU boxen, so the calculations get even more complicated. Finally, as we've seen in Cali, power requirements differ between consumer and business regimes.

    So there are a lot of variables pulling the number this way and that, wiht an increasing degree of speculation, but sinece the real point I was trying to make -- your use of your PC has variable value to you between the hours you are and aren't using it, and that if you're using it, no pro-rate fee will induce you to stop -- would have been accurate even if the nickel number is low by a factor of 5, I left the back of the envelope calculation and went on.

    -clay
  • True, but they can also save on multi-cpu boxes, paying less than $1000 for a chassis, and running a bok for more than 20K hours, so the nickel an hour is more a back of the envelope calculation designed to illustrate the gap between what they'd pay and what you'd take.

    I made a similar point in another thread [slashdot.org] on this topic.

    -clay
  • by Duncan3 (10537) on Sunday January 21, 2001 @12:11PM (#492225) Homepage

    You forgot a forth and more critical criteria which all the P2P companies keep saying "pay no attention to the man behind the curtain"

    Fourth, the company must not care about the data, algorithms, and results becoming public immediately. Available to any competitor or evil cracker who wants to mess with you.

    Forget the other 3, you will have a nearly impossible time finding anyone willing (stupid enough) to give you money and live with #4.

    Of course, some of us have known this for a very long time, commercial distributed computing was put to sleep in the 70's. But then, in the 70's VCs were smarter.

  • by Kyobu (12511)
    How does wanting a song create supply of it on Napster? Once you get the song, I get it, because in the future you will supply the song. But if there are k songs available on Napster and you add your m songs, there are still k songs available to you, not k + m, because you already have the songs on our computer.
  • I think that's precisely the problem. The things that we've found lend themselves well to distributed computing (SETI, cracking encryption) don't lend themselves as well to making money. What company wants to pay for either of the above two, let alone a lot of money?

    That's not to say that P2P is already doomed though. I don't think that it's a technical problem at this point, I think it's a business problem. Someone has to figure out a problem that has two attributes: It must lend itself to being more quickly solved via distributed computing, and it must be something with such a high demand that someone is willing to pay big money.

    It's very possible that P2P could take off...but I'm not holding my breath. Even if they solve the issue of "what problem is worth the money", there's still the problem of "who will let us use the cycles" and "how do we keep from getting cheated".


    -Jer
  • by pheonix (14223) <(gro.etaivolbi) (ta) (todhsals)> on Sunday January 21, 2001 @11:53AM (#492228) Homepage

    I don't understand how the author came to the nickel per hour number.

    Sure, the cost of the machine boils down to (by his math) a nickle an hour, but that's not the same cost as the company would have to take on.

    A company would have to buy the system, hire the IT personnel, cover their benefits, store them, pay for the electricity, pay for the heating/cooling, pay for maintenance, parts if they break, warranties, etc. These (and more) are little things that a home user might not even consider when determining if it's "worth it", and makes the "break even" point much higher than a nickle per hour.

    I'd like to see the same breakdown done with some more accurate math.


    -Jer
  • Firstly, the algorithm must be parallelizable.

    Not necessarially. Depending on the cost of cycles, it may be sufficient to use a less efficient approach that is not completely scalable.

    Secondly, the algorithm must have minimal communication requirements. Rendering, for example, is parallelizable, however in most modern rendering applications each computer would need an entire description of the scene being rendered. This could be a huge amount of information, running into gigabytes, yet it would need to be distributed to every participant in the rendering process.

    I do actuarial projections for a life insurance company. I have a set of assets (investments with future cash flows to the company) and liabilities (insurance policies with future cash flows). The liability cash flows influence what funds are available for investing (or dis-investing). Industry regulations require that I investigate the adaquacy of the type and amount of the company's assets under different interest rate environments. The regulators want to make sure that even if interest rates and/or equity values spike up or drop down dramatically, the company will not become insolvent. The tricky part is that the liability cash flows are often dependant on the interest income that they assets can generate and the interest income that assets can generate is dependant on the interest rate environment when each of the cash flows occurs.

    Because of the interrelatedness of the two portfolios, there are two ways I can go about dividing up this project. I can slice by time, calculating all of the cash flows that I need at a given time to determine whether there is cash to invest or assets to sell. This is the most efficient method, but it has high communications requirements.

    Or, I can project all of the liabilities over future times and get a series of liability cash flows which then imply a series of asset portfolios and interest rates and then iterate back and forth between liabilities and assets until the answers converge. (Typically on the order of 10 or so iterations and not hundreds or thousands). This is less efficient, but has lower communications requirements. If cycles are sufficiently cheap, it may pay to use a less efficient algorithm.

    Thirdly, the algorithm must be robust, if someone decides to screw things up, and hack their client to send back malicious data (as happened with Seti@home) they must not be able to invalidate the work that everyone else has done.

    That depends entirely on the incentives. SETI was vulnerable because there was a competition to rack up completed cells. If the incentives to participate are designed properly, the may be no incentive to hack the client.
  • I don't know about the rationale of umbrellas being small...I've walked through the financial district and seen a few people carrying beach umbrellas, I swear. They could cover a taxicab with these things, honestly. :)
  • I agree totally. The author gave the example of needing to complete a quarterly report in a timely manner. Large companies buy mainframes to do quarterly reports, not because of processing power, but because of their throughput. Huge datapipelines that make RAID-5 SCSI look like a straw. The actual processing is usually very small (add up the numbers we give you).

    The cost to buy and maintain the bandwidth needed to push the data out to distributed resources would be more than the cost of a mainframe.

  • The ocean is full of gold, but no one has made a fortune off of it because the cost of collection exceeds the worth. The gold is dissolved in the water, and to get it you have to find a way to precipitate it out without getting all the other salts.

    There are a lot of unused cycles out there, but they are cheap and so finely dissolved that the extraction process isn't viable.

  • what does the Robotech Defense Force have to do w/ music? hmmmm
    O you mean Resource Description Framework....

    i always mix the two up..


    nmarshall

    The law is that which it boldly asserted and plausibly maintained..
  • Dornfest and Brickley are the authors of *that chapter*, see http://www.oreilly.com/catalog/peertopeer/author.h tml [oreilly.com] for a full list.
  • I think the article makes lots and lots of interesting points, but I don't really see how a company can expect to make enough money off of these spare cycles to say double my DSL capacity and pay my electric bill. If they get paid a penny an hour, and they get 21*24 hours, that's still only around 5 dollar, which is nowhere near enough to pay for what they want to give as an incentive.

  • One of the models of payment described in the article ( where you can bid for computer resources ), was described in a novel by Greg Egan called "Permutation City". In it there is a central exchange where you can bid for computer time for intensive tasks. At one point a woman who is using a simulation requiring extra computer cycles gets "bumped off", 'cause a consortium has out-bidded everyone at ludicrously high rates.

    A good book (for other reasons as well). Unfortunately I managed to leave it in a Sydney hotel room.

    Cheers,
    SuperG
  • Although it doesn't rain as much in SF than NY, I find that when it does, I don't have an umbrella or a cab. Blows...

    -Moondog
  • I agree with your point about the analogy, but for a different reason: Shirky says...
    Why does an increase in demand produce opposite effects on supply -- more available umbrellas and fewer available taxis?
    *bonk* This is extremely sloppy terminology and extremely sloppy thinking. Rain and the corresponding increase in demand does absolutely nothing to the supply of taxis. There are just as many taxis as there were half an hour ago when it wasn't raining. Yes, there are fewer empty taxis - which is probably the point he's trying to make apropos to inflexibility of supply - but it's not a clear way to make the point.
  • it strikes me as paradoxical that the line for a company to pay for time vs. buy themselves a new machine is so low (roughly a nickel/hr). No one would sell time for a nickel/hr, but companies still pursue this as an alternative to buying more boxes. perhaps i'm missing something, but this seems like an economical oddity.
  • MusicBrainz [musicbrainz.org] aims to be an RDF schema for all kinds of music metadata, complete with unique identifiers for tracks, artists, albums, etc. It solves the "various artists" problem that tends to plague other systems. The main thing I don't like about it is the fact that it uses a custom protocol and query language, which shouldn't really be necessary since RDF is RDF is RDF.
  • by Wesley Felter (138342) <wesley@felter.org> on Sunday January 21, 2001 @01:38PM (#492241) Homepage
    I agree that "cycle-borrowing" apps are unlikely to be profitable, but there's a lot more out there. In particular, I think P2P content-distribution networks like Freenet and Mojo Nation have the potential to save lots of money.
  • Excerpt from the excerpt from the upcoming Dornfest and Brickley book
    Metadata applied at a fundamental level, early in the game, will provide rich semantics upon which innovators can build peer-to-peer applications that will amaze us with their flexibility.

    Isn't this what the cue cat people did when they embedded serial numbers into their scanners?
    It allowed them to start creating a metadatabase on you!
  • No search engine can work on a system of shared trust. A trust web can only be applied to smallish (in the hundreds) groups of publishers. Spiders hoping to index the entire web must take each page at face value, and hence are still spoofable.
  • by Ars-Fartsica (166957) on Sunday January 21, 2001 @04:27PM (#492244)
    Computers may be faster than we need them to be, but for the forseeable future, there isn't enough bandwidth to support the casual sharing of media among home users. For most Americans, they'll be lucky if they can get DSL/cable - some estimates put broadband in the home at 10% penetration at most. Even for the users who can get broadband at home, 1.5 mbps (the max offered by most vendors) isn't enough to support seamless file sharing without a noticeable drain on bandwidth.

    Added to which, once we actually start paying for music downloads (its inevitable), there will be demand for reliable downloads. Hell, if I'm paying real money per song, timeouts and crappy connections are unacceptable. Once money enters into the equation, I want the media in a timely and efficient manner.

    None of this matters in a future where everyone has fiber to the home, but we're at least fifteen years away from that being a reality for most citizens.

  • by Ars-Fartsica (166957) on Sunday January 21, 2001 @04:43PM (#492245)
    The only sites out there that make explicit use of the meta tag are, well, explicit! Any metadata in a web page that is authored by a human is going to be subject to rampant spoofing. Presuming search engines actually indexed metadata in a strict way, you could simply contually redefine your keywords and subject matter to reflect whatever you thought was the hot topic of the day. Presuming sites were indexed rapidly, webmasters could simply watch the news and use popular keyphrases ("presidential inaguaration") to get their sites indexed as always being relevant.

    This is why search engines that work off of metadata typically give you porn links for almost anything, and why Yahoo can't be spoofed (their surfers actually visit the site to see what its about).

  • hmmm.. somehow my numbers got off there. 1. should have been that my numbers on RC5 give me bragging rights along the lines of "mine's bigger than yours."
  • by sethgecko (167305) on Sunday January 21, 2001 @11:48AM (#492247) Homepage
    Now imagine you owned such a machine and were using it to play Quake, but Popular Power wanted to use it to design flu vaccines. To compensate you for an hour of your computing time, Popular Power should be willing to offer you a nickel, which is to say 1/20,000th of $1,000, the cost of your device for that hour. Would you be willing to give up an hour of playing Quake (or working on a spreadsheet, or chatting with your friends) for a nickel? No. And yet, once the cost crosses the nickel threshold, Popular Power is spending enough, pro-rata, to buy their own box.

    This hits the nail on the head. I'm willing to install the RC5 client on my machines for several reasons: 2. It's a project whose goals I more or less believe in. (SETI would be an even better match, but I ended up installing the dneet client first.)
    3. I already installed it. Once it's been configured and set to run on my FreeBSD and linux boxen I can forget about it. More trouble to disable it or find a new distributed project, install that, configure it, and get it running on all my computers.

    I think this article gets it right. The returns for me contributing my spare cycles as well as the effort to install and set up the clients is not worth whatever change they are paying. Like the article says, if they pay a nickel per processing hour, that takes roughly 2.28 years to earn a thousand dollars if my system is running the client 100% of the time at full processor speed. (I have no idea how much these systems actually pay, I'm just quoting the articles example.) The actual amount earned would actually be much less as I do various things with my system: burn CD's, play quake, write papers, etc. The long term return of pennies, or less than pennies on the hour makes me say that it's not worth it. And I suspect that without some higher incentive, like distributed.net crunching keys has been turned into a competition, most people just aren't going to take the trouble to signup for these paid distributed services. To have enough computers to make some serious money, you had to have enough money in the first place to make whatever they pay you small change.

  • Their capital expenditures for the box may be a nickel an hour, but that's not the cost to them to own, store, maintain, upgrade, supply A/C and AC, and do various other things necessary to gain benefit from the processor/storage.

    -Nev
  • by tchuladdiass (174342) on Sunday January 21, 2001 @11:41AM (#492249) Homepage
    The article makes mention that projects like distributed.net and seti@home are successfull because participants donate their unused cycles for a project they feel good about. However, it would be rather difficult to get a large enough number of volunteers to sign up for tasks they are indifferent to, even if they get paid a small amount, because we are dealing with insignifficant sums (to the participants).

    Therefore, I propose that projects such as Popular Power, etc., abandon the idea of paying individuals a few nickles for some amount of cpu processing, but instead pay the charitable organization of the individuals choice.

    For example, you could sign up your machine on, i.e, Team FSF, and for every X number of opperations your machine computes for these distributed projects, a dollar would be donated to the FSF.

  • by Alien54 (180860) on Sunday January 21, 2001 @11:57AM (#492250) Journal
    I found these parts fascinating [forgive the mild editing for the sake of clarity in this post]:
    Cycles, disk space, and bandwidth are resources that get provisioned [paid for] up front and are used variably from then on. The way to get more such resources within a P2P system is to change the up-front costs -- not the ongoing costs -- since it is the up-front costs that determine the ceiling on available resources.

    There are several sorts of up-front incentives that could raise this ceiling:

    • PeoplePC could take $100 off the price of your computer in return for your spare cycles.
    • AudioGalaxy could pay for the delta between 384- and 768-Kbps DSL in return for running the AudioGalaxy client in the background.
    • United Devices could pay your electricity bill in return for leaving your machine on 24/7.
    the money off the cost of a new machine is meaningless to me since I build my own. The delta between the two bandwidth rates is more interesting, but that differance only costs me maybe 10 bucks a month (if that).

    but the idea of someone paying my electric bill....

    I gotta admit that I can see the potential for abuse on this one.

    On the other hand, this comment tossed in at the end gives me the shivers:

    (Of particular note here is Microsoft, who has access to more desktops than everyone else put together. A real P2P framework, run by Microsoft, could become the market leader in selling aggregated computing power.)
    As a moment of paranoia sets in, I can see MS adding this element to there .NET "solution", that as a part of participating in .NET, they own your spare cpu cycles which they can they sell to someone else.

    I do not know what it is, but I always seem to have this moment of distrust whenever I read something involving MS.

    Then again, maybe the MS marketroids read Slashdot, checking it out for this kind of thinking, in order to get new marketing ideas that they can use.

    ;-)

  • Are these guys ready to pay for foreign processing power too ?

    If so, which companies ?

  • It's worse than that from where I'm sitting. A cab is a form of transportation. People don't necessarily take cabs simply because it's raining. If people don't need to be somewhere, they won't take a cab, period. An umbrella is more analogous to a portable form of shelter rather than a form of transportation. I'm sure there's no shortage of shelter in New York when it rains. I think the cab thing is a matter of perception. If you're looking for a cab in the rain and you're getting wet, it pisses you off. but a cab isn't really like an umbrella.
  • I wrote to one of these companies. I forgot which one at the moment. But I was suggesting a plan that I think makes great sense and is a kind of processing already commonly done in a distributed manner --3D animation rendering.
    I'm sure I'm not the only one who loves setting up scenes with a gazillion meshes and complex camera shots and ray traced textures mirroring each other into infinity. The wireframe of a wild animation is within reach of many typical desktops, it's the rendering that you'll never get --especially at high res-- without your own CPU farm.
    So, my proposal to whatever company it was, was to allow artists to send in descriptions of their animation along with say a single screen shot and then CPU cycle donators could go to the site and decide which project they wanted to patronize.
    The reward? --not money but a free copy of the final project.
    In my mind, this is where the net can transcend conventional notions of economy. Heady stuff.
    But what about the money? Well, the organizing site would have to get by on ad revenues. But since it would be an entertainment site, that might not be too bad.
  • I don't think that it's a technical problem at this point, I think it's a business problem. Someone has to figure out a problem that has two attributes: It must lend itself to being more quickly solved via distributed computing, and it must be something with such a high demand that someone is willing to pay big money.

    They don't even have to do that - they just need to set up the business case for a CPU cycles bidding market, and the applications will create themselves. (So, it is still a technical problem - creating the infrastructure so that arbitrary processing packages can be distributed according to the results of the bidding.)

    Wired pondered this [wired.com] recently.. it could be really cool if someone can pull it off.

  • Sorry this is a bit off topic, but I don't think the analogy works. Everyone instantly fills up the cabs yes, but nobody actually buys an umbrella because they already have one at home, and are too stingy to go and buy a second one just because they forgot it. So maybe in fact it does work but not in the way they intended it to...
  • Can't wait till they distribute chocolate this way...
  • I disagree with the basic premises of the analogy presented. The corporate buyer isn't buying the computer as such, rather, they are buying the processing cycles. Contrary to the article,a cash discount for the computer would be inappropriate given the nature of the good. Economists assume that producers (ie sellers of computer cycles) make decisions based on future costs and benefits; in this case the cost of the computer is a sunk cost (and a factor of production), and it has no bearing on the amount the "seller" of cycles should expect to receive. The "nickel per hour" model is invalid based on simple economics. A far more suitable model would be one where the price of computing cycles is not fixed, but floating, similarly to any other marketable commodity you'd care to mention and traded through an exchange. Any first year microeconomics student knows that in a free market, supply and demand interact to determine a fair price. So if I had to use my computer to write a report (or play Quake :) the price that would tempt me to stop and sell my computing cycles to a P2P company would have to be very high, however if I wasn't using my computer at all, I wouldn't mind selling cycles for a very low price.

Everyone can be taught to sculpt: Michelangelo would have had to be taught how not to. So it is with the great programmers.

Working...