Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
The Internet

Map the Internet... In One Day? 263

rjbrown99 writes "There have been numerous stories over the past few years on Bill Cheswick's Internet Mapping Project. The Lumeta folks even created a company out of it. Well, now there is a competitor. A single guy with a single computer is working to accomplish the same feat - within ONE DAY and using open-source tools to do it. The new project is called Opte and can be found at www.opte.org." He's made some progress and is looking for volunteers.
This discussion has been archived. No new comments can be posted.

Map the Internet... In One Day?

Comments Filter:
  • by Anonymous Coward on Friday November 14, 2003 @04:10PM (#7476252)
    Who
    This project was started by me (Barrett Lyon) as a response to a conversation with my colleagues at Network Presence. Over a lunch we were discussing William Cheswick and Hal Burch's Internet Mapping Project. I was not very impressed with the results of their project, they produce beautiful maps but they don't seem to be very useful nor do they release their code freely. Their mapping also takes nearly six months to generate a single map. My comment was that, "I can write a program that can map the entire net in a single day." The comment was met with some hostility. Thus, this project was born.

    What
    The goal of this project is to use a single computer and single Internet connection to map the location of every single class C network on the Internet. It is obvious that the Internet is not routed as a bunch of class-c networks, but it is easy to see that by treating the Internet IP space as a bunch of class C networks, it will be possible to make a detailed map of the entire Internet. The global Internet address space currently offers 32 bits worth of unique host addresses, or a theoretical maximum of 2^32=4,294,967,296 hosts. In reality, the address space has been allocated in fairly large contiguous blocks, which renders strictly optimal utilization difficult. The smallest block that is logically routed via BGP or allocated by ARIN is a class C network (CIDR /24.)
    At the rate of 194 traceroutes per-second it is possible to scan the entire theoretical 2^24 space within a single day. Thus about 16,777,216 class C networks could be processed by a single computer in a single day. Yet, there are huge portions of network blocks that are no longer used, many network blocks fall into the RFC 1918 standard and other blocks that are reserved by ARIN.

    According to ARIN there are about 47 class A networks in the reserved status (search ARIN for OrgName "Internet Assigned Numbers Authority".) Doing the math results in a reduction of 3,080,192 class C blocks to be removed from the scan list, leaving us with a theoretical list of 13,697,024 blocks.

    Applying some additional thought large portions of the 13.7 Million blocks may route to the same place. By testing about 20 routes at random within a class B and comparing the results, it is possible to see if there are multiple routes worth investigating or if the entire thing goes to the same place. By applying that logic it increases the speed of the scanning.

    After some testing and beta code I proved that with enough bandwidth it is possible to scan the entier Internet with a single computer. The 1/5th of the Internet map only took about 2 hours to create, yet it generated nearly 200k/sec of traffic and put my machine at a load of 60+ while scanning. If you apply the math, the entire internet would take about 10 hours to scan and another hour or two for the visual map output.

    I found a lot of value in the project, so after the proof of concept was completed I continued to program. I turned the entire system into a distributed client/server model. The clients request a chunk of random IP space from the server and when it is completed the IP space is registered with the server. This is done until all of the IP space has been scanned. I'm also working on a stats system so I can monitor the productivity of the different scanning nodes and users involved in the project.

    By taking a more distributed approach the data will look more like the real Internet. It will show more of the backup routes, more of the smaller links in different countries, etc. When the first version of the code is done I should have about 5 to 10 different scanning nodes running on the Internet. If you would like to donate a computer and some bandwdith to this project, please contact me. I can give credit where credit is due!

    When
    The first scanning tests began in late October 2003 and I wish to have the project generate a new map every week.

    Where
    Currently the project is hosted in San Francisco on a multi-homed fiber ba
    • The maps look a lot better than those from the Internet Mapping Project, and look like they're generated using one of the standard Open Source network mapper programs.


      The download link was broken, when I tried, so I don't know what their modified traceroute is like. That seems to be the most interesting part of this project.

      • Since we all like pretty graph pictures go over to: http://networkviz.sourceforge.net/ and look at the packages out there. Many of these need help, so don't hesitate to offer your services if you like graphing. Most of these would be able to view these internet graphs interactively, which would be far more exciting than just pictures.
    • by bigmouth_strikes ( 224629 ) on Friday November 14, 2003 @05:40PM (#7476999) Journal
      If he's mapping the whole Internet in a day he should be able to stand up to a little Slashdotting, shouldn't he ?
    • What's disturbing about the current map thus far, is it clearly shows how CENTRALIZED the internet really is. This old idea of traffic routing around damage is in fact a rather fragile network of handfull of backbone nodes. I would have expected more lower hierarchical nodes crisscrossing the network, forming more of spiderweb system, rather than everything going across 3 or 4 nodes.

      • Re:Disturbing (Score:2, Insightful)

        by ultranova ( 717540 )
        Of course, this could simply be a matter of traffick using the fastest route available. If there's an information superhighway and an information dirt path, then as long as the superhighway stays up, it's going to be used.

        In other words, the low-level interconnects probably wouldn't show up in a scan like this, because the backbone nodes are faster. That doesn't mean they aren't there, just that data prefers the faster routes as long as they are available. There could be a million paths that don't include
  • by swordboy ( 472941 ) on Friday November 14, 2003 @04:10PM (#7476255) Journal
    Several maps of the internet right here [google.com]
    • I think that last [google.com] one is either wrong or way way in the future.
    • Re:Here ya' go... (Score:3, Interesting)

      by BadCable ( 721457 )
      Why don't they make maps like that of say the telephone network?

      That'd be very interesting to see with very similar benifits.

  • by defMan ( 175410 ) on Friday November 14, 2003 @04:12PM (#7476268)
    I am in serious need of more bandwidth and hardware power. If anyone has a Co-Located system on a nice network to donate to this project for a few months, I would be very happy!

    Slashdotting was never easier!
  • by bcolflesh ( 710514 ) on Friday November 14, 2003 @04:12PM (#7476272) Homepage
    Go past the burnt-out Cray and then right at the Commodore64 Contiki server - you'll see my drive lights.
  • ...his web server is already unavailable within minutes of it being posted on Slashdot...

    Mapping...Slashdot.org......
  • by GoofyBoy ( 44399 ) on Friday November 14, 2003 @04:15PM (#7476307) Journal

    IP Address: 127.0.0.1
    Computer: The one from Microsoft with the Start button in the bottom left hand corner.
    Location: my bedroom.
  • by DoctorMabuse ( 456736 ) * on Friday November 14, 2003 @04:15PM (#7476308) Homepage
    SCO IPs are in the Mordor address space.
  • by Fux the Penguin ( 724045 ) on Friday November 14, 2003 @04:17PM (#7476322) Journal
    Okay, yes, I fully admit that it's cool to map the internet in one day. Regardless...I think I hear about some internet every other day.

    There's John Quarterman [mids.org] who's been doing it for years, and then the CAIDA [caida.org] visualization tools, and Cybergeography [cybergeography.org] and the Internet weather report [internetweather.com] and damn maps [mapblast.com] and more maps [mapquest.com].

    Note to everyone: please stop mapping the internet.
    • Well, internetweather.com just seems to talk about some corporate merger, so I'm not sure why you linked that.

      Mapblast and Mapquest, to the best of my knowledge, are physical world mapping tools.

      Am I missing some hidden link on these sites that takes me to "map the internet?"

      Regardless, why should people stop doing it? Why do you care what other people do that doesn't affect you? And why are moderators giving +5, Informative to trolls? Many links in a post and it must be informative?
    • by jd ( 1658 )
      I believe this new project uses the CAIDA tools. The maps look like the output from their Java-based network mapping package.

      However, it looks like it's one map a week, not one a day, and that's only with more power. Based on the charts on the site, it's going to take between 3-4 months to map a decent portion of the Internet, and he's only going to Class C resolution.

      Further, he's mapping as a spanning-tree. This means that tunnels, load-balancing and multipath connections cannot be shown at all.

      Also

      • I'm not really into this stuff, so I apologize if I miss anything obvious. However, the technology he claims to have used is PHP, a (self-?) modified traceroute and GraphViz [att.com]. No Java seems to be involved, which would explain why it only takes a day to map it out. ;)
  • by Goody ( 23843 ) on Friday November 14, 2003 @04:17PM (#7476327) Journal
    Well, there's one less server to map...

  • by burgburgburg ( 574866 ) <splisken06NO@SPAMemail.com> on Friday November 14, 2003 @04:18PM (#7476340)
    half a day with a broken computer, dial-up access and a guy with no hands.

    Top that!

  • by Lugor ( 628175 )
    the Internet came to him! And he was no more.
  • Creepy (Score:5, Interesting)

    by Seanasy ( 21730 ) on Friday November 14, 2003 @04:25PM (#7476403)

    When I first saw the image on the right [opte.org] it looked like a human brain. It would be creepy if the Internet had a sort of fractal self-similarity to our physiology.

    • It would be creepy if the Internet had a sort of fractal self-similarity to our physiology.

      Agreed.

      Good material for an X-Files episode ....

      -kgj
    • Re:Creepy (Score:2, Insightful)

      by Anonymous Coward
      How it appears graphically is decided by the person who translates the database to an image. They could make it S shaped if they really wanted. Not creepy.
    • Re:Creepy (Score:3, Funny)

      by zangdesign ( 462534 )
      Following your metaphor - the internet's genitalia must be really huge then. Or at least the portion of the brain responsible for sex.
    • "It looks like a brain, but it seems to be damaged..."
    • Although in a slightly different sense, actually.

      In a sense, the results of the project do seem to match earlier research [computerworld.com] on the topology of the web; at a glance, the graph arrived at [opte.org], does seem to be scale-free [computerworld.com] in nature.

      Which, actually raises an interesting question. Scale free networks, by their nature, are supposed to have certain highly connected nodes [arxiv.org], the connectivity of which, is extremely critical to the network as a whole.

      In particular, look at the resultant graph for one-third [opte.org] of the net. Note

  • is more about geolocation than mapping, but I guess I deserve at least a passing mention :-)

    Simon.
    • Why? Just so you can help evildoers like MLB.TV block viewers from seeing broadcasts in their area? No thanks, I'd rather IPs not be associated with geographic locations.
    • is more about geolocation than mapping, but I guess I deserve at least a passing mention :-)

      Not really, it can't even guess the country from my host name (which has a two letter TLD).

      And when I provide the information all I get is

      Warning: mysql_fetch_row(): supplied argument is not a valid MySQL result resource in /opt/www/net/www.hostip.info/add.html on line 83
      • Would you by any chance drop me a line ? I'd be interested to know what it is that's causing the problem, and I could give you a url to click on that'll give me all the debugging info I should need.

        I suspect you're behind a proxy or firewall, and the script can't parse the IP address information :(

        Simon
  • You're Welcome (Score:2, Redundant)

    by Stalemate ( 105992 )
    We just made his job easier. There is one less web server to map now!
  • Now map the people mapping the internet in one day.
  • by Fux the Penguin ( 724045 ) on Friday November 14, 2003 @04:30PM (#7476444) Journal
    A single guy with a single computer...

    He's mapping the Internet. Why am I not surprised he's single?
  • rsync (Score:3, Interesting)

    by bigjnsa500 ( 575392 ) <bigjnsa500@nOSpAM.yahoo.com> on Friday November 14, 2003 @04:31PM (#7476456) Homepage Journal
    Why can't somebody just rsync the Google search cluster? Wouldn't it have the same results this guy is looking for?
  • A single guy with a single computer is working to accomplish the same feat

    Uh... is that 21st Century Math? Crap. My kids are going to come home from school and I won't be able to help them with their homework.

  • Speaking of maps, sorry, this is my childhood coming back to haunt me in the /. crowd:

    "I knew I should have taken that left at Albuquerque." -- Bugs Bunny

  • it sound like a William Gibson novel, the one with the guy obsessed over the "form" or "shape" of the cyberspace being a "snapshot" of the universe. i can't seem to remember the name anymore
  • by hburch ( 98908 ) on Friday November 14, 2003 @04:47PM (#7476590)
    As a side comment, now I understand why my connection got so slow.

    [Internet Mapping Project's] mapping also takes nearly six months to generate a single map. My comment was that, "I can write a program that can map the entire net in a single day."

    The Internet Mapping Project maps the Internet in under two hours (105 minutes for this morning's run). I'm not certain where the six months came from. The rate limitation is the packet rate limit we set (500 packet per second).

    Map layout time is not included in that time, but that is not done on a daily basis. A map layout take about six hours, as I recall. It only took a couple weeks to produce all the layouts necessary for a movie of the Internet [lumeta.com] from Aug 1998 to Jan 2001 based on the daily runs.

    CAIDA [caida.org] also creates daily maps of the Internet as part of their Skitter project. Their schedule varies between measurement points. In addition, other projects, such as the Mercator project and the RocketFuel projects, also map or did map the Internet.

    Each project has slightly different goals. Skitter focuses on paths to major web and DNS servers. Mercator attempted to discover networks with limited pre-knowledge. RocketFuel wants a very accurate map of a particular ISP. The Internet Mapping Project is focused on the router connectivity within and between public backbones.
  • ...when featured on Slashdot, now, is he? =^^=
  • by fv ( 95460 ) * <fyodor@insecure.org> on Friday November 14, 2003 @04:53PM (#7476640) Homepage
    As the author of the free Nmap [insecure.org] ("Network Mapper") tool, I have also considered creating a map of the entire Internet. I would have focused on end hosts (where they are, what operating systems and services they run, trending, etc.) instead of routing. Rather than try this from a single high-bandwidth machine (as with Opte), I was going to take a distributed approach. I would release a P2P-like application that users could run and each scan small sections of network space to be contributed to the global database. The app would be called Nmapster :). I also liked to think about it as a "caching service", so that you don't have to spend the time rescanning the Microsoft network if someone else has done so in the last N hours.

    Then I came to my senses and decided to work on more practical and less controversial projects such as Nmap Version Detection [insecure.org]. But the subversive in me still hasn't given up entirely on Nmapster :).

    -Fyodor

  • Brilliant! (Score:3, Funny)

    by Quixadhal ( 45024 ) on Friday November 14, 2003 @04:56PM (#7476655) Homepage Journal
    You want to map the internet?

    1 Setup a site saying you want to map the internet.
    2 Get posted on slashdot.
    3 Parse the referer logs.
    4 ???
    5 Profit!
  • bad for business (Score:3, Interesting)

    by glassesmonkey ( 684291 ) * on Friday November 14, 2003 @05:03PM (#7476712) Homepage Journal
    Maybe people already take this into consideration, but won't this impact webhosting? Won't people try to get their webpage/company closer to the main trunk / center of map? When you look for a hosting service (basically an IP address) right now most people don't consider where in the map the host is.

    I mean with this tool, I would look up where my new IP would land me and try to find a host closer to the main backbones. Is this already done now by most people?

    (on another subject the maps remind me of the species origin stuff)
    • To some extent, this is already a problem. There is a common benchmark for connectivity (I've forgotten its name, for which I apologize). The benchmark is performend by looking at performance to a set of known locations (for the most part, peering point, by my recollection). Companies wanting good numbers connect to those peering points directly.

      The problem you allude to is believed to be responsible for the power-law behavior of the Internet. If you look at the distribution of degrees, there are more

  • The code for this is distributed, then anyone on the internet can scan the entire internet for some nuance on this purpose.

    (shiver)

    Perhaps a centralized open database would be a good idea.
  • Hierarchy (Score:3, Insightful)

    by sploxx ( 622853 ) on Friday November 14, 2003 @05:23PM (#7476880)
    Has anyone noticed that nearly all of the maps have a more or less tree-shaped structure?
    This means concentration of power. So, the real, failure-tolerant internet is gone, at least it seems to be.
    • Re:Hierarchy (Score:4, Interesting)

      by daves ( 23318 ) on Friday November 14, 2003 @05:32PM (#7476940) Journal
      Has anyone noticed that nearly all of the maps have a more or less tree-shaped structure?

      No matter where you are on the net, your view is going to look like a tree with you at the center. Traceroute-type mapping will not capture the redundancies.
      • Admitted, but I think that's not 100% true. I did some mapping myself (filtering the output of various traceroutes) and there were cycles in my graph. But this is some time ago.

        I don't see *any* cycles in his map.
  • I'm mapping teenkelly.com right now!
  • So if we assume the electrical signals of his packets travel at the speed of light (186 000 miles per second) across the internet (which they don't really, but we'll ignore that for this argument), then logic tells us that the internet must have less than 16,070,400,000 miles of cable in order for this to work. Because his data cannot travel any faster along the pipes.

    And that's only one way... Assuming query and response, his packets have to effectively travel double the existing cable lengths.

    So do all
  • Internet Topology (Score:2, Interesting)

    by Tacoguy ( 676855 )
    I have followed various projects related to mapping cyberspace through the years and have always found An Atlas of Cycerspaces [cybergeography.org] to be fascinating.

    Mapping by Lumeta is one such methodology and I even have a poster of theirs printed by Peacock Maps [peacockmaps.com] (server down just now) in my office.

    I have noticed that these mappings take a long time to complete and being able to map in a short time frame could be beneficial in much the same way that Internet Traffic Report [internettr...report.com] can be to visualize traffic patterns or di
  • I'd be interested in seeing a real global world map with the locations of servers pinpointed on the map to show the density of computer equipment around the global. Actually, it wouldn't even need the real map to exist, if all the points of light to represent a computer server were placed in their proper geographic locations, I bet you'd get a very good mapping of the world. In fact, it would probably look similar to the famous map of the world [spaceflightnow.com] at night where the lights from industrialized countries creat
  • Why not map Autonomous Systems instead? Routes to AS are being advertised by BGP, and a set of well placed looking glasses would be all it takes to get a big picture. I never saw anything like an AS mapping, with the ASes as nodes and the (BGP announced) routes between them as links.

    Of course, some AS span multiple geographical areas, but this is also true of class C networks.

    The big advantage of mapping ASes is, that there are not so many of them, compared to class C nets, thus resulting in much simple

  • by Porag_Spliffing ( 66509 ) on Friday November 14, 2003 @07:21PM (#7477808) Homepage
    I see his trick already. Post on /. that you plan to map the entire net and then wait till the entire net maps its way to you.

    P.S.

    Is there such a thing as trecart ?
  • "Got to see the whole net
    From Yahoo on down to eBay--
    In just one day!"
  • by Alomex ( 148003 ) on Friday November 14, 2003 @07:42PM (#7477921) Homepage
    Notice that he maps the paths from his computer to the rest of the world. That is not the same as a map of the entire Internet.

    To illustrate, if I map routes from, say Chicago, I'm likely to miss the direct connection between Seattle and San Francisco, as there is no traffic I could generate that would take that path.

  • his day is up!

The optimum committee has no members. -- Norman Augustine

Working...