Map the Internet... In One Day? 263
rjbrown99 writes "There have been numerous stories over the past few years on Bill Cheswick's Internet Mapping Project. The Lumeta folks even created a company out of it. Well, now there is a competitor. A single guy with a single computer is working to accomplish the same feat - within ONE DAY and using open-source tools to do it. The new project is called Opte and can be found at www.opte.org." He's made some progress and is looking for volunteers.
This server will die ! (Score:5, Informative)
This project was started by me (Barrett Lyon) as a response to a conversation with my colleagues at Network Presence. Over a lunch we were discussing William Cheswick and Hal Burch's Internet Mapping Project. I was not very impressed with the results of their project, they produce beautiful maps but they don't seem to be very useful nor do they release their code freely. Their mapping also takes nearly six months to generate a single map. My comment was that, "I can write a program that can map the entire net in a single day." The comment was met with some hostility. Thus, this project was born.
What
The goal of this project is to use a single computer and single Internet connection to map the location of every single class C network on the Internet. It is obvious that the Internet is not routed as a bunch of class-c networks, but it is easy to see that by treating the Internet IP space as a bunch of class C networks, it will be possible to make a detailed map of the entire Internet. The global Internet address space currently offers 32 bits worth of unique host addresses, or a theoretical maximum of 2^32=4,294,967,296 hosts. In reality, the address space has been allocated in fairly large contiguous blocks, which renders strictly optimal utilization difficult. The smallest block that is logically routed via BGP or allocated by ARIN is a class C network (CIDR
At the rate of 194 traceroutes per-second it is possible to scan the entire theoretical 2^24 space within a single day. Thus about 16,777,216 class C networks could be processed by a single computer in a single day. Yet, there are huge portions of network blocks that are no longer used, many network blocks fall into the RFC 1918 standard and other blocks that are reserved by ARIN.
According to ARIN there are about 47 class A networks in the reserved status (search ARIN for OrgName "Internet Assigned Numbers Authority".) Doing the math results in a reduction of 3,080,192 class C blocks to be removed from the scan list, leaving us with a theoretical list of 13,697,024 blocks.
Applying some additional thought large portions of the 13.7 Million blocks may route to the same place. By testing about 20 routes at random within a class B and comparing the results, it is possible to see if there are multiple routes worth investigating or if the entire thing goes to the same place. By applying that logic it increases the speed of the scanning.
After some testing and beta code I proved that with enough bandwidth it is possible to scan the entier Internet with a single computer. The 1/5th of the Internet map only took about 2 hours to create, yet it generated nearly 200k/sec of traffic and put my machine at a load of 60+ while scanning. If you apply the math, the entire internet would take about 10 hours to scan and another hour or two for the visual map output.
I found a lot of value in the project, so after the proof of concept was completed I continued to program. I turned the entire system into a distributed client/server model. The clients request a chunk of random IP space from the server and when it is completed the IP space is registered with the server. This is done until all of the IP space has been scanned. I'm also working on a stats system so I can monitor the productivity of the different scanning nodes and users involved in the project.
By taking a more distributed approach the data will look more like the real Internet. It will show more of the backup routes, more of the smaller links in different countries, etc. When the first version of the code is done I should have about 5 to 10 different scanning nodes running on the Internet. If you would like to donate a computer and some bandwdith to this project, please contact me. I can give credit where credit is due!
When
The first scanning tests began in late October 2003 and I wish to have the project generate a new map every week.
Where
Currently the project is hosted in San Francisco on a multi-homed fiber ba
Re:This server will die ! (Score:2)
The download link was broken, when I tried, so I don't know what their modified traceroute is like. That seems to be the most interesting part of this project.
Open Source Graph Libraries (Score:2)
Re:This server will die ! (Score:5, Insightful)
Disturbing (Score:2)
Re:Disturbing (Score:2, Insightful)
In other words, the low-level interconnects probably wouldn't show up in a scan like this, because the backbone nodes are faster. That doesn't mean they aren't there, just that data prefers the faster routes as long as they are available. There could be a million paths that don't include
Here ya' go... (Score:3, Funny)
Whoah! (Score:2)
Re:Here ya' go... (Score:3, Interesting)
That'd be very interesting to see with very similar benifits.
Re:Here ya' go... (Score:2)
He needs more bandwidth (Score:5, Funny)
Slashdotting was never easier!
Re:He needs more bandwidth (Score:2)
Re:He needs more bandwidth (Score:4, Funny)
Turn left at the third router... (Score:5, Funny)
what good is a map of the internet anyway? (Score:2)
Things look good so far... (Score:2, Funny)
Mapping...Slashdot.org......
Ok heres my part... (Score:5, Funny)
IP Address: 127.0.0.1
Computer: The one from Microsoft with the Start button in the bottom left hand corner.
Location: my bedroom.
Re:Ok heres my part... (Score:2)
That's Netscape, isn't it?
His Map Is Wrong (Score:5, Funny)
Been there, done that (Score:4, Informative)
There's John Quarterman [mids.org] who's been doing it for years, and then the CAIDA [caida.org] visualization tools, and Cybergeography [cybergeography.org] and the Internet weather report [internetweather.com] and damn maps [mapblast.com] and more maps [mapquest.com].
Note to everyone: please stop mapping the internet.
Re:Been there, done that (Score:2)
Mapblast and Mapquest, to the best of my knowledge, are physical world mapping tools.
Am I missing some hidden link on these sites that takes me to "map the internet?"
Regardless, why should people stop doing it? Why do you care what other people do that doesn't affect you? And why are moderators giving +5, Informative to trolls? Many links in a post and it must be informative?
Re:Been there, done that (Score:3, Interesting)
However, it looks like it's one map a week, not one a day, and that's only with more power. Based on the charts on the site, it's going to take between 3-4 months to map a decent portion of the Internet, and he's only going to Class C resolution.
Further, he's mapping as a spanning-tree. This means that tunnels, load-balancing and multipath connections cannot be shown at all.
Also
Re:Been there, done that (Score:2)
Slashdotted ! (Score:5, Funny)
I can map that Internet in ... (Score:4, Funny)
Top that!
Re: (Score:3, Funny)
Re:I can map that Internet in ... (Score:2)
Signed: MacGyver
Re:I can map that Internet in ... (Score:2)
Pfft! Kids today (Score:3, Funny)
Huh? 2 nodes? Why the hell should that matter?
Re:Pfft! Kids today (Score:3, Interesting)
Since I linked to his site, I should mention that Martin Dodge has gathered a nice collection of maps of the Internet on his CyberGeography [cybergeography.org] site, including many historical maps [cybergeography.org]. CyberGeography also includes many other interesting types of maps.
And one day... (Score:2, Funny)
Creepy (Score:5, Interesting)
When I first saw the image on the right [opte.org] it looked like a human brain. It would be creepy if the Internet had a sort of fractal self-similarity to our physiology.
Mod Parent Up (Score:2)
Agreed.
Good material for an X-Files episode
-kgj
Re:Creepy (Score:2, Insightful)
Re:Creepy (Score:3, Funny)
Re:Creepy (Score:2)
My thoughts exactly. (Score:2)
In a sense, the results of the project do seem to match earlier research [computerworld.com] on the topology of the web; at a glance, the graph arrived at [opte.org], does seem to be scale-free [computerworld.com] in nature.
Which, actually raises an interesting question. Scale free networks, by their nature, are supposed to have certain highly connected nodes [arxiv.org], the connectivity of which, is extremely critical to the network as a whole.
In particular, look at the resultant graph for one-third [opte.org] of the net. Note
hostip.info (Score:2)
Simon.
Re:hostip.info (Score:2)
Re:hostip.info (Score:2)
Not really, it can't even guess the country from my host name (which has a two letter TLD).
And when I provide the information all I get is
Warning: mysql_fetch_row(): supplied argument is not a valid MySQL result resource in
Re:hostip.info (Score:2)
I suspect you're behind a proxy or firewall, and the script can't parse the IP address information
Simon
Re:hostip.info (Score:2)
I'm getting loads more hits than usual just because of the abuse in the URL
[Huge grin]
Simon.
You're Welcome (Score:2, Redundant)
Fantastic. (Score:2, Funny)
It's too easy... (Score:5, Funny)
He's mapping the Internet. Why am I not surprised he's single?
rsync (Score:3, Interesting)
1 + Volunteers = 1 (Score:2)
Uh... is that 21st Century Math? Crap. My kids are going to come home from school and I won't be able to help them with their homework.
Bugs Bunny (Score:2)
"I knew I should have taken that left at Albuquerque." -- Bugs Bunny
the form (Score:2)
Re:the form (Score:2)
Internet Mapping Project does daily maps (Score:5, Informative)
[Internet Mapping Project's] mapping also takes nearly six months to generate a single map. My comment was that, "I can write a program that can map the entire net in a single day."
The Internet Mapping Project maps the Internet in under two hours (105 minutes for this morning's run). I'm not certain where the six months came from. The rate limitation is the packet rate limit we set (500 packet per second).
Map layout time is not included in that time, but that is not done on a daily basis. A map layout take about six hours, as I recall. It only took a couple weeks to produce all the layouts necessary for a movie of the Internet [lumeta.com] from Aug 1998 to Jan 2001 based on the daily runs.
CAIDA [caida.org] also creates daily maps of the Internet as part of their Skitter project. Their schedule varies between measurement points. In addition, other projects, such as the Mercator project and the RocketFuel projects, also map or did map the Internet.
Each project has slightly different goals. Skitter focuses on paths to major web and DNS servers. Mercator attempted to discover networks with limited pre-knowledge. RocketFuel wants a very accurate map of a particular ISP. The Internet Mapping Project is focused on the router connectivity within and between public backbones.
Re:Internet Mapping Project does daily maps (Score:2)
He's not gonna get very far... (Score:2)
I have considered something similar (Score:5, Interesting)
Then I came to my senses and decided to work on more practical and less controversial projects such as Nmap Version Detection [insecure.org]. But the subversive in me still hasn't given up entirely on Nmapster :).
-Fyodor
Brilliant! (Score:3, Funny)
1 Setup a site saying you want to map the internet.
2 Get posted on slashdot.
3 Parse the referer logs.
4 ???
5 Profit!
bad for business (Score:3, Interesting)
I mean with this tool, I would look up where my new IP would land me and try to find a host closer to the main backbones. Is this already done now by most people?
(on another subject the maps remind me of the species origin stuff)
Already happening (Score:2)
The problem you allude to is believed to be responsible for the power-law behavior of the Internet. If you look at the distribution of degrees, there are more
Be afraid (Score:2)
(shiver)
Perhaps a centralized open database would be a good idea.
Hierarchy (Score:3, Insightful)
This means concentration of power. So, the real, failure-tolerant internet is gone, at least it seems to be.
Re:Hierarchy (Score:4, Interesting)
No matter where you are on the net, your view is going to look like a tree with you at the center. Traceroute-type mapping will not capture the redundancies.
Re:Hierarchy (Score:2)
I don't see *any* cycles in his map.
I'll pitch in! (Score:2)
limited by the speed of light? (Score:2, Interesting)
And that's only one way... Assuming query and response, his packets have to effectively travel double the existing cable lengths.
So do all
limited by the speed of IP (Score:2, Funny)
Re:limited by the speed of light? (Score:2)
Internet Topology (Score:2, Interesting)
Mapping by Lumeta is one such methodology and I even have a poster of theirs printed by Peacock Maps [peacockmaps.com] (server down just now) in my office.
I have noticed that these mappings take a long time to complete and being able to map in a short time frame could be beneficial in much the same way that Internet Traffic Report [internettr...report.com] can be to visualize traffic patterns or di
Is there a map of sever locations on a real map? (Score:2)
AS mapping would be more useful (Score:2, Interesting)
Why not map Autonomous Systems instead? Routes to AS are being advertised by BGP, and a set of well placed looking glasses would be all it takes to get a big picture. I never saw anything like an AS mapping, with the ASes as nodes and the (BGP announced) routes between them as links.
Of course, some AS span multiple geographical areas, but this is also true of class C networks.
The big advantage of mapping ASes is, that there are not so many of them, compared to class C nets, thus resulting in much simple
Let the mountain come to Mohamed (Score:3, Informative)
P.S.
Is there such a thing as trecart ?
It could make a good musical... (Score:2)
From Yahoo on down to eBay--
In just one day!"
The Internet according to Garp (Score:3, Insightful)
To illustrate, if I map routes from, say Chicago, I'm likely to miss the direct connection between Seattle and San Francisco, as there is no traffic I could generate that would take that path.
Too late (Score:2)
Re:Lets face it (Score:3, Informative)
Don't feel too bad, the government here (USA) is on your side mainly. I would disagree with you as there is always good money to made here but you have to be creative. The idea is to push each other further to create new ideas and technologies where you can make money.
Re:Lets face it (Score:2, Insightful)
This is nothing new, you can find free software to solve just about any problem. People buy commercial software because in some cases free versions aren't advanced enough or easy enough to use or they want to buy support.
Re:It has to be asked.... (Score:5, Informative)
Mapping the Internet weekly will allow us to see major disasters in different parts of the world. The Internet is a huge disaster censor. If I had maps of pre-war Iraq and then compared them to today, one could see how badly Iraq was destoryed. The idea of a metaphysical representation of the real world is very interesting to me.
The project can show the Internet growth.
The project is art.
Comment removed (Score:5, Funny)
Re:It has to be asked.... (Score:2)
Re:Actually, no. (Score:2, Funny)
Re:It has to be asked.... (Score:2)
Until everyone stops allowing connections from him or everyone continues to firewall off whatever they don't want.
Re:It has to be asked.... (Score:2)
Think:
1. Mapping the entire internet in around 12 hours.
2. Using the whois databases to map each subnet to a specific internet provider.
3. Use that information to map links to physical locations on a world physical map.
Then we have a logical-to-physical map of the internet. The problems with this are the fact that some large IP blocks d
Re:It has to be asked.... (Score:2)
You have no evidence that this guy has any motive in using Iraq other than it is on everyone's mind. And when he thought disaster area that's what popped up first.
Maybe he doesn't know that the internet would be up and running better now than it was then. Maybe he thinks Iraq was destroyed by Iraqis after the war ended.
So how bout instead of running inside yelling "LIBERALS" locking the doors and peeking out the curtains. You co
DoubleClick (Score:2)
They should sell their data to DoubleClick. They could serve geography-sensitive banner ads! If they know you live in San Francisco and you are visiting a food web site, they could serve up banner ads for local San Franciso restaurants.
I think there's a company called MaxMind GeoIP that already does this.
Re:It has to be asked.... (Score:3, Insightful)
Re:It has to be asked.... (Score:2)
Because... (Score:2)
Because it is there.
It has to be answered (Score:2)
You sure you're on the right website?
Re:It has to be asked.... (Score:2)
Re:Are we overlooking something? (Score:3, Insightful)
Re:Are we overlooking something? (Score:2)
Yup. Mapping Internet in a day so it could be done weekly / daily. Quite a shitload of traffic though.
Re:Are we overlooking something? (Score:2)
Re:Great! (Score:3, Insightful)
Dude. (Score:5, Funny)
Web pages are NOT internet hosts.
Web servers are relatively few compared with other types of hosts on the internet.
The World Wide Web is NOT the internet.
The World Wide Web is NOT the internet.
The World Wide Web is NOT the internet.
The World Wide Web is NOT the internet.
Re:Dude. (Score:2)
From the Jargon File, even:
Just because "pinging" usually refers to using the "ping" command doesn't mean that it always does.
Re:Dude. (Score:3, Funny)
C:\>ping www.slashdot.org
Pinging www.slashdot.org [66.35.250.151] with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.
C:\>
He's right! You can't ping web pages!
Re:Dude. (Score:2)
Re:Dude. (Score:2)
The World Wide Web is NOT the internet.
The World Wide Web is NOT the internet.
The World Wide Web is NOT the internet.
Eh... people who repeat themselves aren't necessarily any smarter than those who make their point once.
Re:Dude. (Score:2)
no, PING uses the ICMP protocol, not HTTP. it has nothing to do with web pages or parsing at all.
But you don't use HTTP to test the presence of a connection/route to a host- you use PING. As such, he is accurate in describing it as a type of communication with a web server.
Re:Dude. (Score:2)
Re:Distributed effort? (Score:2)
Yay! I final
Re:He should try this: (Score:2)