Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
DEAL: For $25 - Add A Second Phone Number To Your Smartphone for life! Use promo code SLASHDOT25. Also, Slashdot's Facebook page has a chat bot now. Message it for stories and more. Check out the new SourceForge HTML5 Internet speed test! ×
The Internet

Web: 19 Clicks Wide 114

InitZero writes "The journal Nature reports that the web is only 19 clicks wide. What it fails to mention is that a least one of those must be through Kevin Bacon. " The graphic at the beginning of the article is gorgeous in a Mandelbrot style-now if I could just have it in a 24 x 30 print.
This discussion has been archived. No new comments can be posted.

Web: 19 Clicks Wide

Comments Filter:
  • by dr_strangelove ( 16081 ) on Wednesday September 08, 1999 @12:23PM (#1694602)
    Let us not forget the story of the statistician who drowned while fording a very wide river with an average depth of 6 inches...
  • A power-law distribution means that the Web doesn't follow the usual mathematical models of random networks, but instead exhibits the type of physical order found in, say, magnetic fields, galaxies and plant growth.

    While looking at the mess of tangled wires in our companies engineering patch-room we have often comment on the near organic appearance of the patch cables interconnecting our company. Now, this is just a very small subsection of the entire Internet. Doesn't it seem possible with hundreds of thousands of patch-rooms, protocols, and processors out there that something could evolve?

    It would start with a few anomolous packets zipping back and forth reconfiguring routers to interconnect into a giant super-being. It's first triamph as supreme net-being would be to spam us all in every known language, "could you please stop pinging it gives me indigestion...*burb*!"

    -AP
  • A few weeks ago, I helped set up a mirror for the LAM pages (http://www.mpi.nd.edu/lam/ [nd.edu]) and while one of the mirrors was spidering our site, downloading everything, we noticed another machine that was on our campus doing the same thing. We found this slightly odd and sent a Big Brother sorta mail (we're watching you spider our site... why?) to the people doing this... got a response about them doing some study about how the web is laid out. They thought they could predict it using some physical or mathematical model.

    I also administer the NDLUG (http://www.ndlug.nd.edu/ [nd.edu]) web server, and noticed massive spidering from the same machine on campus.

    Now I read this article and see this quote:

    "The Web doesn't look anything like we expected it to be," said Notre Dame physicist Albert-Laszlo Barabasi, who along with two colleagues studied the Web's topology.

    so, i guess i don't have much of a point, but it's kinda cool to see that something actually came of some people in the college of science abusing our poor 486 webserver [nd.edu]...

  • Two or three of the top search engines cover only 15% of the Web (there is one that reportedly hits near 30%, but its name escapes me),

    Northern Light [northernlight.com], as I recall. I don't know that that claim is verified in any credible way, however.

  • Others have responded giving the distance from www.slashdot.org to www.microsoft.com, but I decided to try the other direction:

    Start: www.microsoft.com

    1. Click on MSN Home
    2. Click on Computing
    3. Click on Operating Systems
    4. Click on Linux
    5. Click on Linux vs NT Server 4.0
    6. Click on FreeBSD and Linux Resources
    7. Click on Slashdot
    So microsoft is at most seven clicks from slashdot. Can anyone do better?
  • I can imagine some cases in which a link-path between two pages would be useful. For example, if you are researching differential geometry and combinatoral topology, you might suspect that there is some connection between them. Unfortunately, a page containing a proof of the Gauss-Bonet theorem -- the connection you're looking for -- probably doesn't contain both of the original search terms on it. A link path between a diff-geo and a comb-top page might work out better.

    "Trailblazing" through link-space was a prime motivator of Bush's Memex vision: finding new paths between separate "pages" of information was the same as discovering new relationships between discrete pieces of knowledge. In fact, knowledge can be thought of as connections between previously unlinked sets of facts.

    In all seriousness, finding a link-path between two separate pages is a thorny issue. First, you are dealing with a directed graph, and as the posts above point out, a link-path from A to B probably won't contain the same set of pages as a link-path from B to A. Then there is the issue of _which_ link paths are useful (and I believe there are some which could potentially be useful) and which aren't; this is largely a decision made based on the weighting you've placed on the Web Pages in question. Finally, there is the issue that you have to have link-structure information sitting around for a good chunk of the Web before something like this could actually work.

    But it would be neat!
  • It also depends on how they did their sampling... Search engines such as Yahoo are easy to traverse, but crawlers like altavista, while they indeed link to a huge number of sites, are difficult to count because you need to enter a search phrase in order to get outbound links.

    I personally don't think this was a really meaningful survey, because of the large range it found. Also, does one really care how far away you are from an arbitrary page? I certainly don't. Generally I care how far my information is from a search engine, which is generally 1-2 clicks, and how easy it is to find said information based on results.

    Rarely do I start from a random page, and try to get to information by clicking through links.
  • By far and away the most awesome thing I have seen in a long time. I love how www.microsoft.com has ICMP blocked!
  • Sounds like the episode on TNG where the enterprise became an intelligent lifeform through all the experiences, etc. that were dumped into the computer, subsequently it went on a journey to procreate. Most definitely it is a life form, or at least on its way to becoming one... I don't know if we could classify it as some sort of "simple" organism as of yet or not, but eventually yes. Defining life as a bag of organic chemicals and processes is very limited, life extends throughout.
  • ...which is how to get "Open Source" in Microsoft Word.
    1. Open up a Word document and type some random text. Copy to clipboard.
    2. Open up another Word document
    3. Select "Paste Special" from the Edit Menu, and choose "Paste Link"
    4. Go to the Edit menu and select "Links". A dialog box will appear.


    The dialog box has a number of buttons, of which the fourth one down is "Open Source". However, the one on my version doesn't work -- ie it does not open the source of Microsoft Windows.

    Sorry to have wasted your time, really.

    jsm
  • Nit-pick. The author says the web is not random but seems to follow a power-law function. Sounds good but power-law functions are also used to characterize random functions. I haven't read the scientific article but I would have to think that term random refers to a white noise function; equal noise level at all frequencies.

    Stochastic (random) functions can be characterized by a range of power-law functions and other spectral shapes. The randomness of the power-law function is given by the randomness (e.g., mean and variance) of the individual components of the spectral function. For instance, draw a x-y plot consisting of a straight line that has a negative slope. The y-axis is the amplitude while the x-axis is the frequency (or the inverse wavelength). Now suppose that this straight line represents the "average" value as random fluctuations about this line exists. This is power-law random function.

    Sorry if this is over simplified. BTW, fractals are characterized by a power-law function. OTOH, true fractal functions have constraints on what the power-law slope can be (Haussdorf dimension).

    Now for something silly. What is the degree of freedom from Gore (Father of the Internet) to Slashot (the bastard child of the Internet)?

  • As I remember it, it was the First Foundation that was based on the mathematical calculation of where society was heading. The Second Foundation was basically the opposite - they were created to police the world of the problems that Psychohistory (mathematics) could not predict...

    The mathematical aspect was indeed the most interesting part of the Foundation books, and it always surprised me that Asimov played it down as the series continued. Sort of like how the problems with trying to implement an absolute ethical system into a being was the most interesting part of his Robot stories, yet he wormed his way out of that (zeroeth law etc).


    -
    /. is like a steer's horns, a point here, a point there and a lot of bull in between.
  • I agree that by itself, the average link distance between two pages (19) isn't a very useful number. However, there is definitely useful information in the article itself:

    1. We are given a real-world (probabilistic) distribution of link distances between pages (i.e. given two randomly chosen pages, what is the probability that the shortest/longest link distance between them is X?)

    2. From the visualizations, we can see that the web is a graph containing a number of densely connected components which are themselves only fairly loosely connected to one another, and that this behavior is fairly scale-independent.

    These two tidbits could lead to impressively improved Web crawlers. You could decide to stop following links once you've gone 25 deep, for example; you could try and determine on-the-fly if more than one of your crawler processes is working on the same densely connected component of the Web and combine their efforts (or move one of the processes over to a new uncharted component), thus effectively searching more of the web. Using similar statistics for distribution of in-link and out-link counts, you could improve crawler heuristics so that pages with a number of out-links significantly deviant from the mean are given more weight for future crawling.

    Oh well, just some random thoughts.
  • The mathematical (topological) definition of the "diameter" of something, is the upper limit of the set of the distances between all pairs of elements taken from this something.
    Since we're talking about a discrete set (a directed graph), you can forget about "upper limit" and just say "maximum of".

    So what they are measuring ("the average distance between two random pages") does *not* match the mathematical definition of a diameter, in spite of their claims.

    I must admit I dunno what is the right term for what they are measuring, though.

  • Hehehe, only two clicks? *Warning Sexist Offensive Comment* I bet women would prefer that it took more clicks.

    But on a more serious note (yeah, right), I once thought about the following. It is too bad that this site is not more pro Microsoft (boy, does this company's name have Freudian meaning, right Billy boy?). Then Rod can put up a link to assembler info. It would be:

    http://slashdot.org/asm

  • Great tool!

    I don't really know much about this stuff, but apparently somebody/thing at SURAnet (server: mae-east.ibm.net, located Vienna, VA) is "causing packets to be lost" on their way to Frisco...
  • "I think that we might end up in an era where, just as people today have their own e-mail addresses, people will have their own Web sites," he said.

    Every llama has their own website...what are you talking about??
  • the distance in the other direction now is only 2 clicks: from slashdot open this article. then click here:
    www.microsoft.org [microsoft.org]

    :)
  • I can get to any site on the Internet with just one click! I just click on the "Location" bar and type in the URL...
  • One of the interesting applications they have at caida is the graphical traceroute that plots the physical location of the hop on a map. www.caida.org/Tools/GTrace [caida.org]
  • I read that article, and I remember that it sounded a lot like what Google already does.

  • People who want to express themselves, and know how to do it, will always get websites.

    Those who don't care to do so, well, won't.

    The real limit is on the number of people willing to be creative.

    I don't think it has anything to do with "cool" technologies - whether I have JavaScript rollovers on my site or not doesn't affect the quality of the content itself.

    D

    ----
  • by howardjp ( 5458 ) on Wednesday September 08, 1999 @12:31PM (#1694626) Homepage
    that you are never more than two clicks from a porn site.
  • The WWW *will* become self limiting. Yes, geeks like us will be building more and more web pages. But more and more normal people (you didn't think you were normal, did you?) will not. Also, as resources start getting tight, some of those wonderful "calling cards" will get wiped. You may no longer need them. Somebody may be willing to pay you for the domain name (okay, that would cause more sites). The admin may decide one day that since the site got no hits in six months, it's gone (think GeoCities).

    Using the tree analogy:
    - Yes, the tree will get a *lot* bigger.
    - Yes, the tree can only get so big.
    - Yes, leaves (pages) and branches (sites) will fall off and hit the WWG (world-wide ground).
    - Yes, there is a gardener, but he's only interested in a branch or two.
    - No, I haven't gotten much sleep lately :).

    The Lord DebtAngel
    Lord and Sacred Prince of all you owe
  • http://www.caida.org/Tools/Plankton/Images/
  • I tried counting how many clicks it takes to get from one web site to another some time back. It's on my web site here [xoom.com].
    My effort never quite got off the ground though :(

  • "To figure a shape to the web I would think you would first have to decide how many dimensions it has."

    If you want to know geometrical shape that would be true. But topological shape is a little different. Topologically, a coffee cup and a donut both have the same shape. The shape in common is that they each have one hole.

    Similarly, their power law reference means that the web is fractal in dimension, which is not the usual 1-2-3-4 dimensionality commonly meant. I would imagine this dimension is somewhere between 1 and 2. It's a set of lines (one dimensional) that are almost dense enough to fill an area (two dimensional.)

    "But even if you only assume two or three dimensions, why 'clicks wide'? "

    It sounds like they are looking at the web as a graph, a series of points (web pages) connected by edges (links). The width of a graph might be found like this:

    For each pair of points in the graph, find the shortest path along edges between the points (in terms of number of edges.) The maximum length among all these shortest paths is the width of the graph.
  • What sort of analysis can be done with points which there is no connecting path? How many of these are there? Why is there no path? How did that affect that average?
  • Just like the '6 degrees of seperation' game, I fail to see how this could be useful. Finding the shortest path between two websites is nice when you're stuck on a machine that can't go to random URLs (secured lynx, some web kiosks, browse slashdot from anywhere, yeah!), but the 19-click average sounds like a curiousity.

    And, ooh, the web exhibits properties of exponential growth, with some sites that have many more links to other sites. Like I couldn't figure *that* one out. Some people post their bookmarks, and lists of links, and other people only link within their interests. A graph of this might look interesting if done correctly, but I still don't see how this would be that useful.

    The graph at the top was pretty, though, it looked like an IFS fractal. They look like stuff found in nature too, so I guess that gives this article a context to exist...
  • by Billings ( 87611 ) on Wednesday September 08, 1999 @02:50PM (#1694634)
    There's an additional study in Scientific American awhile back that shows evidence of that. The premise of the study was to create a search engine that refines the quality of web sites by giving them a "hub" and a "target" rating (I believe that was the terminology used). The "hub" rating was determined by the number and quality of the target sites the page linked to (quality being determined by the "target" score), and the "target" rating was determined by the number and quality of the "hubs" that linked to it. (again, quality determined by "hub" score) So they'd run the list of sites through several time using these, and each time the hub and target scores were refined by each other. Eventually stabilized scores are obtained by running this evaluation scheme enough. To relate it to the topic at hand in this thread, though, when they studied the web using these, individual communities of targets and hubs could be discerned by an above-average rate of linkage within the group.
  • Yes, we have indeed, but I like the hub-system of this one the best. But stop with the Kevin Bacon references already... ;-) Anyways, I think that a YOU ARE HERE, like CmdrTaco said in the last set of these to crop up on Slashdot, would have been pretty nifty.
  • The tree may increase in size, but the nodes at the root will have more paths to the leaves. As databases like yahoo become larger and larger it stands to reason that as the size of the web increases they will have more links to leaf sites. What would be interesting research is if the 19 remains constant through web growth, or if it decreases (or increases) as more "portals" condense the web down to only a few entry points for the masses.


    -Rich
  • You are absolutely right, but then I am guessing that within 3 deviations that this is true.
  • It's not true that topology is about geometry on surfaces, though many examples from it are. Topology is about which subsets of a space are open. Given a definition of openness of subsets that satisfies some requisite properties, you have a topology. Sometimes, as with function space topologies, there isn't a very good geometric analog.

    The fractal dimension is an invariant of the topological space, so it's embedding in a superspace like 2D or 3D Euclidean space is not important in terms of it's fractal dimension.

    I wasn't trying to imply that the lines have area, but a space filling curve has fractal dimension close to 2 because it "nearly" fills an area, and does so in the limit. It was a rough, and possibly poor, attempt at an analogy to this situation. You may be right that fractal dimension isn't important here, but it would be my guess that the increase in diameter from an given increase in nodes that was mentioned in the article was calculated could be based on that dimension, as it relates to how self-similar structures scale.
  • For you Van Vogt fans...

    I agree with many here that the analysis in the article described is not much to talk about, and yeah not much use except for generating some cool fractalized images, but the basic precept behind their development may be the only real way of performing a true mapping of the Net's shape and growth, something that will be important in the years to come.

    You can't build a useful map of the Internet's structure in the way you map the streets that wind through your town. Future search tools will require a fair amount of intelligence not in the way they go about a search, but in they way they 'think' about a search (there is a difference). Topographical mapping -- not index cataloging -- will help developers figure out these ways of thinking.
  • Clicks Wide seems accurate to me because you can get back to your starting point through a different route than the way you came.
    I think the Web analogy is the most accurate, with multiple routes to each site.
    Maybe Im just thinking two dimensionally, lousy brain.
  • "The Web doesn't look anything like we expected it to be," said Notre Dame physicist Albert-Laszlo Barabasi, who along with two colleagues studied the Web's topology. A power-law distribution means that the Web doesn't follow the usual mathematical models of random networks, but instead exhibits the type of physical order found in, say, magnetic fields, galaxies and plant growth.

    "It's alive! It's alive!!!"

    Seriously, though, that's very interesting, but it's actually obvious when you think about it. The reason why galaxies are not distributed randomly is that there are centres of attraction that begin as random fluctuations in an evenly-distributed environment; but as matter condenses, these types of patterns emerge.

    Now, replace "gravity" with "number of hits". A site with a lot of hits, of course, represents a centre of interest, where people congregate. And naturally, they will either link to the site, or try to get linked from it.

    And so, the same patterns emerge.

    Hey, that means Slashdot is kinda like a black hole generator! Once it aims its beam at a site, it submerges it with hits until the site reaches critical mass and implodes, dropping out of the known Uiverse!

    "There is no surer way to ruin a good discussion than to contaminate it with the facts."

  • My girlfriend told me a while back that ALL people are only something like 5 or 6 people away from each other. (I guess as in a who-knows-who kind of way) Anybody else knows more about this?

    I believe this is false. Counterexample:

    There are people in remote region that do not have contact with many people outside their groups. Say there exists a tribe (lets call them a) in a forest in Indonesia. Suppose the only contact that this group has with the outside world is through some anthropologists. Now suppose there was another tribe(b) in the same region that had contact only with a. Assume there are also tribes c and d that is in the same situation as a and b but in a totally different region, maybe Africa. Now consider the degrees of separation between a child in tribe d and a child in tribe b. Clearly it would be something like child(b)->parent->tribe a-> anthropologist ->?->anthropologist->tribe c->parent->child(d). In order for the six degrees of separation to be effective, the two anthropologists have to know each other directly. This isn't necessarily true given the number of anthropologists around.

    I believe the six degrees of separation came about when someone figured that everyone knows at least 30 other people. Therefore a given person is separated by one person from 30*30=900 people. Analogously a person is separated by 6 people from 30^7~18 billion. However this doesn't take into account the redundancy in the relationships.

    For example, many of the people that your friends know are from small cliches so the real relationships appear like many tightly interconnected clusters with a few connections between clusters.

    BTW, I think I've been doing too many math proofs.

  • And now we can work this into an evil money-making scheme^W^W^W^W cool game: I give you two random Web pages, you have to find a click path between the two. The shortest wins.

    First example: from the /. home page to... (let's see, something really obscure...) this one [thegoodnam...etaken.com]. Ready, set, go!

    Of course, me and my investors hope that this path is also packed full of computer-generated Web abs, since this is how we get rich^W^W make it more fun for everyone!

  • *grin*

    Ahh! But! We are touching upon a _very_ important issue here: latency. Our's is tad high today, since she's in Europe and I'm in California. It'll prawly be around noon EDT before I get a response out of her.

    An ICMP-like roundtrip could maybe be shorter than that (I have her hotel number), but the annoyance that that would generate would probably void the possibility to get a full blown TCP-like connection setup any day soon.

    Breace.
  • by SEWilco ( 27983 ) on Thursday September 09, 1999 @03:44AM (#1694646) Journal
    Hypersearching The Web [sciam.com] is that SA article. Google [google.com] is different as it primarily tries to find authorities based on links, while the Clever method in the article is more like finding authorities within communities. The Clever algorithm looks at text around a link to estimate importance and relevance of a link.
  • child(b)->parent->tribe a-> anthropologist ->?->anthropologist->tribe c->parent->child(d)

    You had to create a rather special example, although a real world worst case would probably involve anthropologist->?>anthropologist replaced by merchant->city resident->city resident->merchant, which is just a little longer.

    I think an example of the real-life short circuits is my wife's friend of a friend. My wife is from another country. We quickly found a friend of a friend of hers five miles from our home in this country. It seems unlikely, but the real math is something like this:

    • In this metropolitan area there seem to be at least 2,000 people from her country.
    • Most people in her country are in two dozen cities, but let's say 50 cities.
    • That's about 100 people here from each city in her country.
    • Scatter those 100 people across that city in whatever distribution you want and there still are several of them within any region of the city.
    • There is a fairly good chance that the friends of those 100 people know the friends of most people in the city, particularly the friendlier ones. (Hey, isn't a friendly person more likely to marry someone from another country? :-)
    • Those 2,000 people in my city are an even denser concentration than the 100 in the remote cities, and may be more likely to know each other through national society and shopping groups.
    • The reverse is also true...there is some unknown number of people from my country in each of the cities in her country, forming more links.
  • I wouldn't be suprised if your math explanation is exactly where it came from. I'm just trying to figure it out so I can justify why I told her she was talking out of her ass... ;o)

    On the other hand, I'm sure that on average people know a lot more than 30 people. If you count unidirectionally knowing someone, it will be even higher.

    Breace.
  • To track people-links, try Six Degrees [sixdegrees.com].
  • David Brin, in Heaven's Reach, listed at least six "orders of life" recognized by the galactic civilization. One of them was called "machine," although "robotic" may have been a better name.

    So in that vein, we may have already created our own encounter with a different life-form. But not the first -- the memetic order is, in the book, a separate order of life that survives in the normal universe as a mental parasite. That is, it's an idea (meme) that can spread and is hard to get rid of.

    Great book, but you really need to read the first two -- Brightness Reef and Infinity's Shore to understand the third.

    -- Dirt Road

  • I really hate those stupid "how can I get from here to there through Kevin Bacon" things. Ultimate silliness if you ask me. The graphic is excellent however :)
  • It's cool and all but where do I fit in?
  • by Peyna ( 14792 )
    They should have done hops rather than clicks, that would be more interesting.

  • That is only an average... And the minimum and maximum is wide apart. The web is not really a web I think, but a set of (nearly) independent webs. An example of this is the set of x86 protected mode programming sites. Nearly all of them are linked with the others, but not with anything else.
  • I am not sure that the web is even that wide. Maybe the scientists did not factor in enough search sites; but I am quite sure that search engines cover at least 150 million web pages total. I don't think that the distance will be that long, except to some hard-to-reach or foreign sites that almost nobody links to. What would be more revelaing is the frequency distribution of distances.
  • by Hrunting ( 2191 )
    The map of the Internet reminded me of a map of an airline's routing table, which, unless you fly Southwest Airlines, usually runs through a series of hubs. I'm sure that the 'Net has it's own hubs (hell, you have Yahoo and Slashdot already), but I wonder if it has many Southwests. Webrings are the only real analogy I can think of, so I wonder if anyone else could throw any more of those out.

    FYI: Southwest Airlines doesn't use a hub-based system of flights, but does direct flights between cities. Many flights are thus 'direct' but not 'non-stop'. They're also pretty cheap. Don't factor the cheapness into your analogies.
  • The problem with any of these analogies is that there's no need for a Southwest on the Web, really; it doesn't cost any more to go directly from Boston [cityofboston.com] to Los Angeles [la.ca.us] on the Web than it does to go through a bunch of other cities. You have to define what the cost is (some type of relvancy index, I suppose) before any analogy would make sense.

    That said, other than webrings, Search engines could be your Southwests. They get you to places directly, but they traverse the web themselves through a series of links.

    Another possible interpretation: any real surfer is her own Southwest, who, when trying to find a piece of information, hits various hubs, and follows links to the source in a somewhat roundabout but usually successful manner. The analogy's weak.

    There's a much closer analogy to Southwest airlines when you look at the Internet than the Web, clearly; then you do have well-defined hubs (the big backbone routers) and carriers, who, admittedly, use each others' networks. It would be as if Southwest could get you from Dallas to Porvoo, Finland by flying you on its planes through its hubs to New York, then getting you on a British Airways plane to London, then a FinnAir plane to Helsinki, then a FinnAir prop to Porvoo. Feel free to extend that analogy...
  • My girlfriend told me a while back that ALL people are only something like 5 or 6 people away from each other. (I guess as in a who-knows-who kind of way)

    Anybody else knows more about this?
    (I can't verify it right now cause she's not here... :( )

    Breace.
  • But to me, the most important part was the last paragraph:


    "I think that we might end up in an era where, just as
    people today have their own e-mail addresses, people will
    have their own Web sites," he said. "But eventually it will
    taper off. Eventually it has to be self-limiting."


    That last sentance makes me think he isn't too sure of the Web's self-limiting qualities. I personally don't think it will ever taper off. Just about the time it starts to get stale, the Netizins will get a new toy (ala JS rolovers, Java applets, Flash, Shockwave, whatever). There will always be too much excitement and new technology.


    And, just as it seems weve run out of things to do, we might actually have a moon base with a couple hundred-thousand miles of 100BaseT. Voila, brand new web to play with. :-)

  • With the advent of bigger bandwidth for single users or small parties, people will only start making more web sites. When DSL became cheap enough, I got it.. and I've already gotten a couple of domain names. Web sites are kinda a fun way to blow an hour or two, and as long as you don't just flesh them out with dumb stuff, they're not bad as a calling card.

    So, yeah, I agree with Zantispam. :)

  • by jelwell ( 2152 ) on Wednesday September 08, 1999 @11:13AM (#1694664)
    The Source code for that mandelbrot set is available at Caida [caida.org]. My friend has been working on the project for quite some time, ever since graduating at UCSD. Most of the work is done by him in the San Diego Super Computer Center. Take a look at the software, it's java and Brad put a lot of cross platform testing into it. So it should run fine everywhere. (Java claim). It has a lot of really nice features to it.
    Joseph Elwell.
  • Well, if they're factoring in clicks, does typing something count? Yahoo!'s database is huge, some categories require 19 clicks themselves to get to. I don't think that's accurate at all...
  • by Anonymous Coward
    Are search engines included? If so, wouldn't that greatly underestimate the diameter of the web?
  • www.sixdegrees.com .. sign up and watch the spam fly.
  • by Jack William Bell ( 84469 ) on Wednesday September 08, 1999 @11:30AM (#1694668) Homepage Journal

    Hmmm...

    The power-law connection means it's possible to figure out the shape of the World Wide Web, even if you can't precisely map out every site and page on the network.

    To figure a shape to the web I would think you would first have to decide how many dimensions it has. Perhaps by assigning a dimension to each method of getting to a page, or perhaps by counting each hyperlink into a page as a separate dimension. Either way it could get pretty hairy pretty quick.

    For example, is a hyperlink on a search engine different in some way from a hyperlink on a personal page? How about a web directory? Bookmarks?

    But even if you only assume two or three dimensions, why 'clicks wide'? Seems more like 'clicks deep' to me. I always think of clicking on a hyperlink as 'drilling down'. Showing my age again I guess...

    Jack

  • by jelwell ( 2152 ) on Wednesday September 08, 1999 @11:32AM (#1694670)
    I think the idea behind this article of a "click" would define "click" to be a mouse click. Which would rule out all typing. It's a fair assumption that the web would be considerably smaller if search engines were lumped together, because you could type in the addresses of the two places you wanted to find the distance of into altavista as +url:www.math.com +url:www.slashdot.org . Placing all sites that are logged by altavista much closer.

    But then do you calculate width by starting at point A and continuing to point B? Because if that were true then the search engine argument would still be relatively benign. As you would still have to reach a search engine from page A in less "clicks" than it would take you to go straight.

    If the "true" diameter is required one could measure in any fashion as long as we agree on a definition of "click" (which I define - for myself - as only mouse presses). So such sites as yahoo might bring unrelated pages closer. But without typing would many be relatively fewer than 19 clicks away? Yahoo is still categorically sorted so sites that are unrelated would need to traverse up the Yahoo category tree after first leaving the first site.

    Joseph Elwell.
  • What would be superb in a graphical representation is a combination of this data (the supposed "distance" from point to point) combined with last week's graphic on /geographic/ location or top-end domain (.com, .org, .etc).

    It would be interesting to see if, for example, internet traffic patterns show any kind of focus or foci about certain domains or sites or even specific boxen, and how those machines are distributed in real space. . . where, essentially, are our eyeballs and electrons going?

    As for "dimensions," a 3D rendering would be the easiest to comprehend. Perhaps a sphere representing the globe, with an atmosphere of satellite link channels, and a substrata of bandwidth pipes and routers. Or a flat geometric field with peaks to represent the Big Iron, fractal spires twisting off as homepages and smaller sites. And isolated islands or floating moons of self-contained networks, or pages that go nowhere.

    Don't mind me, I just finished reading Snow Crash, Diamond Age, and Idoru, and would enjoy a virtual walk through the data we're all accumulating.

    Rafe
    V^^^^V
  • by Anonymous Coward
    They said the web is like a tree and will grow until it eventually runs out of resources. It's not shaped at all like a tree. It's more of a tumbleweed.
  • Since the average distance is 19 clicks, would it then be possible to take a trip round the web? In 80 days perhaps?

    Of course, it's no problem surfing the web continously for 80 days (with a T3 and a tank of coffee) but how far would you get by then?

    Where does it all start? End?
    -
    The internet is full. Go away!
  • If you remove search engines, you're removing a major part of the natural growth of the Net itself.
  • OK, I'll buy what you are saying. In fact I can imagine this 'shape' better by thinking of it as a 'cyclic directed graph' made up of nodes and edges than I can from the picture in the article.

    But I am thinking of this in programming terms because cyclic directed graphs are a data structure I understand (somewhat). It isn't something I could easily describe to a non-programmer, even using a white board. The thing is, this view of the web is nothing new! As a data structure the web has always been, was even designed to be, a 'network' (read 'cyclic directed graph').

    So now we are back where we started. We haven't learned anything new from the article other than the fact that the average number of links between any two 'nodes' is 19. A number that is meaningless because it is perturbed by the large number of sites that link many pages to a single home page (all pages one or two clicks away). For purposes of developing new search and crawling algolrithms it would have been more useful to show the number of links out versus links within a site.

    Jack

  • by LMariachi ( 86077 ) on Wednesday September 08, 1999 @01:13PM (#1694677) Journal
    This was touched on by this Scientific American article [sciam.com] a few months back. It covers another project looking for useful ways to index the Web. They came up with a similar hub/cluster topology based on authorities, which are sites that a lot of other sites link to, and hubs, which have large collections of links to authorities. Unfortunately, the cool illustration that was in the print version didn't make it online. (You can pretty much skip the first two sections of the article unless you want to read the authors' grossly incorrect definition of spamming; it doesn't get interesting until the "Searching With Hyperlinks" subhed.)
  • The article mentions the Internet was created nine years ago... Is this the same Internet that Al Gore invented?

    I just think the people that made the foundations (like the creators of ARPAnet, etc.) should be given a little credit, that's all.
  • by Anonymous Coward
    1- sure, everybody can get their own web page, probably will, but how many "here's my dog, check out my car" web pages does the world need. I think we (humanity) are finding out the hard way that we aren't as individual as we'd like to think. This is why I haven't made my own web page about my own boring life. 2- the other thing that's said, the real imporant thing in this article, has some pretty chilling implications - which have already proven true. The major portals are the main targets of the search engine, the "fringe" sites are not. Only a small fraction of sites are even scanned by search engines, and this will become a limiting logic for them. Thus, only mainstream sites will be found by search engines (in general), and the rest of the sites, the covert sites, the underground, the fringe, will not be reachable by the vast majority of people, and this has all the Orwellian implications you think it does. All this time, the netizens have been thinking, this internet empowers the individual, it frees information. BUZZZT! wrong, try again please. It empowers the rich, the mainstream media, the corporate establishment. Now how much do you think it matters that we wire the ghettos and set up internet access for cambodian rice farmers? It's not going to be a channel that connects everybody with everybody else, it's going to be just another orifice for greedy corporations to rape us with. Sorry about the class-warfare tone, and the oblique anti-Microsoft slam, but at least I didn't say anything about Beowolf clusters, though a Beowolf cluster of internets would be cool.
  • Two or three of the top search engines cover only 15% of the Web (there is one that reportedly hits near 30%, but its name escapes me), and most cover fairly much the same areas. Calculating search sites would generate some interesting examinations towards redundancy. But in any case 150 million pages is not really a lot anymore.

    Also, one must factor in whatever percentage exists for pages/sites that may link to the outside, but are not linked back to. These one-way linkages would spread off a click average.
  • How many clicks does it take to get to the center of a tootsie-pop? http://fatdays.com/jokes/misc/licks.html

    "The number of suckers born each minute doubles every 18 months."
  • And with this link, you're but a click away 8^)

    http://visualroute.datametrics.com/ [datametrics.com]
  • How would they find these unconnected sites to do an analysis of them?

    I for instance have a two websites I use to distribute private files and such to friends. But, to the best of my knowledge neither of these has links to it.

    They need to recruit a few ISPs, examine all the pages on them (after stripping usernames) to see how many webpages don't have links to or from them , then use that figure to adjust their estimate of the total number of pages.

    (They need to get an estimate of the number of pages they can't find by spidering, and the only way to do that is go to the source...)
  • by AME ( 49105 )
    We just need more pages that say This page brought to you by /. to really change the world.

    Dear Lord no! There's too many of us here now.

  • Eh. I figure we're trying to map all of the pages anyhow, so brute-force is going to be as good as anything else, but it might be a good idea to mark the dense parts for later indexing.

    However, it might be good to identify growth and stagnation if we're going to be that complex about it. Hmm.

    Well, I guess it's food for thought, anyhow. But I don't think this is a radically new vision of the web...
  • I also liked the episode with those neat-o little in-organic cube lifeforms that lived on that planet those people were terraforming. You know, the ones that pulsed light and called all huminoid life, "ugly bags of water." But I digress...

    -AP
  • Yeah, sounds exactly like google. Maybe that's where they started.
  • Just like the end of Mona Lisa Overdrive,
    "The Matrix has a shape" etc..
    Maybe when we have discovered that shape we will discover life on Alpha Centuri as well...
    (No not the game :))
  • How do you suppose they account for dynamically-generated (i.e. database- or object-driven) web pages that typically don't register with search engines?

    What about sites with logins, where hundreds of pages are hidden from public view?

    It seems to me that most of what's interesting about the emerging behaviour of the web is buried within one of those two types of sites... discuss?!
  • Have a look at these chaps [peacockmaps.com]. Similar map sort of thing, but looks better.
    (No, I don't work for them ;)
  • The really cool thing would be if someone were to write a program (probably a cgi script) that would use a search engine that list pages that link to a site to let you type in two web sites and see the hyperlink path. I wonder how far slashdot.org is from www.microsoft.com?
  • Hops is another issue. For example, it's easy for me to do a traceroute and tell you that I'm 14 hops away from Slashdot, but how many web sites would someone have to click through to get from my web site to Slashdot?
  • For some reason the comment to which I am replying was moderated down as offtopic. Probably because it looks like the person was getting confused with the image on the news page and the real Mandelbrot set. In fact, this poster was correct, and it's a cool link, so please someone, moderate this comment up.
  • by The Cunctator ( 15267 ) on Wednesday September 08, 1999 @11:45AM (#1694695) Homepage
    Mean hop distance is relatively easy to measure, as IP addresses are nicely arranged. Measuring clicks takes a bit more work, I feel. A somewhat cryptic document [caida.org] by another guy at caida.org puts the average hop at 14-15. A great link with more info is the
    Internet Distance Maps Project [umich.edu].

    For more pretty pictures, check out the Internet Mapping Project [umich.edu].
  • No doubt. My SO signed up through a friend. To get fully registered, you have to provide the names and addresses of two other people.

    I killed the browser window at that point.

    Once my SO twigged to the fact that you *have to* spam people in order to join, she felt as bad as I did. Needless to say, she hasn't been back.

  • The article says the WWW is 9 years old, not the Internet. The average consumer doesn't know, but the two are not the same.
  • by Anonymous Coward
    That's pretty interesting, but maybe some sites aren't good enough for people to put links to them on their webpages. Maybe the only way you could find them is by search engine. Something that might be cool too is that if you got a large database of people with different IDs (ie Hotmail/ICQ/AIM) and had them send in their contact lists or address books, then check how far apart people are through those contacts. Of course, I doubt you could persuade many people to just send you their address book or contact list.
  • Actually, the article says the WWW is 9 years old. I had the same reaction at first, then I quickly double checked myself, and discovered they did say WWW, and not the Internet.

  • http://www.caida.org/Tools/Plankton

    It was on that badass little graphic. There are links to info about LOC records in Bind8 that hold longitude and latitude info. You can then be plotted on the map. Or something like that. I just went for more of the purtty pictures. :)
  • I think everyone will eventually get their own webpage, not just geeks. For example, I created my own web-site (RobertGraham.com) simply as a way to avoid spam [robertgraham.com] (I made the mistake of signing up with the Netcom ISP where they require you to receive spam as part of their agreement). It's more useful than one would suppose, and not simply as a means of putting useless junk up there. For example, I've stopped showing people physical pictures, but instead simply put them up on my website and give people a link. (For example, I almost got myself killed in a car accident recently, and some friends/family wanted to see pictures of the totalled car [robertgraham.com].

    The whole process gives a whole new dimension to e-mail and general communication, with this posting as an example. I've started to think and communicate in hypertext.

  • The other comment that mentioned topology I think was rather off-course on the dimensions of the web. Topology is the study of geometry on surfaces; donuts and teacups are similar because primitive shapes behave in similar ways on them.
    A second factor is that you must consider what you are calling dimensions. A representative graph may be made in any number of dimenstions - flattened to two, or made in 3d. But the dimension in the fractal since is a different animal. No matter what dimension you draw it in, the fractal dimension stays the same. Yes, the dimension would be between 1 and 2 because as the number of links -> infinity, the 'perimeter' does too, but the lines certainly don't have an area!
    The shape of the web, however, is not about fractal dimensions. It's about summarizing and arranging the points and connections in such a way that clustering and localization phenomena begin to emerge. With 800 million+ nodes, this task is nearly impossible - however, an analogous structure of fewer nodes and clusters can be made that will have visible patterns.
  • But Tim Berners-Lee created the HTTP protocol set in 1989, so ten years would be more accurate.
  • ... check out http://www.sixdegrees.com [sixdegrees.com], they're actually trying to link people through their relationships to see if everyone really is related to everyone else by 6 degrees or less. I've actually run into people I know personally just by looking through my different 'degrees'... that's quite a weird experience.
  • Did anyone try that Plankton java app they have? It's quite fun, and pretty.
  • It's also important to note that these are one directional links. The 6 degrees of Kevin Bacon are all bi-directional links, where A working with B is the same as B working with A.

    Yahoo does a good job of sending feelers out to other pages, but does a shitty job of getting linked to. A page like Netscape.com [netscape.com] probably gets a lot of references (This page brought to you by blah blah.) if not as many connections out.

    We just need more pages that say This page brought to you by /. to really change the world.
  • One thing, perhaps, we can agree on...

    Google, which ranks its results by link "importance" ... DirectHit, which bases its analysis on "what people are clicking on" ... and Inktomi, which is "looking at what people are actually viewing."

    Hmmm, follow the herd or go for something "immportant" (perhaps the perfect word for a search, a clue if you will) seems like a pretty simple decision. I suggested it to the folks in my company (along with www.m-w.com and babelfish) and they love it. I'm feeling lucky.....

    (Get Andover to buy it, or maybe the other way around...)
  • Anybody else knows more about this?

    (I can't verify it right now cause she's not here...:( )

    Yeah, but at least she's only one hop away from you. She's at least two hops away from most of us, so I would have to say that the best person around here to ask is .. um .. you.


    ---
    Have a Sloppy day!
  • I think they are tracking clicks between sites, not between pages. So Yahoo and Playboy are 1 click apart, even though you have to do through Yahoo quite a ways to find it.

    Overall this reminds me of a game we used to play in the college computer clusters. You get a bunch of people to start at a really innocent site, say, whitehouse.gov, and then race to see who can get to playboy.com the fastest just by clicking. Pretty interesting stuff really.. it's amazing how close some really conservative sites are to hardcore pr0n, when measured in number of clicks.
  • At first I was amazed at the claim that

    the web is only 19 clicks wide.

    To me, this means the maximum distance between any two sites is 19 clicks. Sort of the way that people claim that only six degrees separate us from any other person in the world. This would be an impressive display of the "web" aspect of the world wide web.

    But this isn't the claim at all. If you read the article, it says that

    there's an average of 19 clicks separating random Internet sites.

    Different story altogether.
  • heck, maybe this is out of line...

    Every time I see an article about a statistical study of something created by man, I get a flash back to Asimov's second foundation : how mathematics can generally describe man, events, history, ... Sure the second foundation was a lot more then that but the mathematical aspect was what I loved most about the concept.

    This study has some distant similarities to it. Statisticians studying the average distance between two ramdomly chosen internet site. The catch is that the entire structure is created by man, there really isn't much ramdomness in it : Compared to the Bacon thing were you may have met someone, that knew someone, at one point or another walking down the street; having a link from your web page to another web page is a completely conscious action.

    Which brings me to the counter-argument that news site such has slashdot or c|net have their content (and thus their links) influenced by random events of the outside world such has tornadoes or floods.

    Which now brings me to a conclusion before I back home, how long will it be before someone attemps to make a measure of the amount of randomness in the web, that is the influence of events that cannot be predicted (to some extent or other, to be determined later) by man. Is the web something that could over time become completely predictable?

    I really should reread these Asimov books.

"The Avis WIZARD decides if you get to drive a car. Your head won't touch the pillow of a Sheraton unless their computer says it's okay." -- Arthur Miller

Working...