Interview with Bruce Maggs 58
Mihai Budiu sent in this interview with Bruce Maggs, a computer scientist who used to work at Akamai, the company which caches content for a great many popular websites. An interesting look at the combination of solving research problems and starting up a new company.
Re:questions? (Score:1)
And to answer your objection, the first time that a customer's object is requested from an Akamai server, it is retrieved from the site (say, CNN.com). Each subsequent request is then in cache and does not need to be retreived each time. As an example, they request CNN.com's index page, which has 20 images (just a guess). If we can assume this is the second request for the page from this computer, it already has all 20. I'm guessing that there is a TTL system implemented to avoid serving of stale data.
If a person doesn't have to go through 14 hops to get to the machine serving the content, this clearly results in a decreated download time.
Re:Wow. (Score:1)
Re:dont shoot the messenger (Score:1)
yeah, that's a great idea. india almost exclusively on a satelite uplink and the pingtime requests are hideous (>2,000ms, easy). i'd rather deal with the occasional lag from my computer cycling through the 6 replies akamai gives to my isp's dns than to deal with the latency from a satelite. just observe how many gamers are on sats...
Re:Why Akamai does and does not use Linux (Score:1)
The problem is that support for WMT inherently means running Windows 2000. I know of other companies trying to use Linux that now have to run Windows 2000 in a VMWare session to get support for Windows Media Format. It's either that, or abandon Linux and go for a pure Windows solution. It seems like it's just a matter of time before they "might as well focus only on Windows since all anybody really wants is WMT".
It's exactly like what happened with Word Document format, but instead of having the effect of people's putting Windows on their desktop, they are putting it on their server.
This is definitely the major wedge for Windows on the server and on the Internet. The media format could possibly become the most important format on the Internet, as a way of delivering all sorts of content overlayed on a media stream. You can deliver any type of content, which gives the potential for it to replace HTTP as the common carrier protocol.
Someone somewhere needs to take a leadership role with the media format. I'm not too optimistic, however, seeing as it's been how many years, and there hasn't been an open reference for a document format that could replace Word documents. RTF obviously doesn't cut it, HTML certainy doesn't either, and XML might be promising, but it's been around for years and I still haven't heard of an open XML format for documents.
It's not just about whether it's Microsoft controlling the standard for the digital equivalent of paper. It's about ANY company controlling the standard for paper. What if there were a paper that only one company could produce, that you needed a special kind of light bulb to read material printed on it? This issue is a lot more serious than anyone other than the "Slashdot crowd" seems to realize. I think it's important enough for a boycott, but since few people outside of the "Slashdot crowd" would understand the reasoning at all, it probably doesn't make sense since you would end up hurting yourself a lot more than Microsoft.
Here goes... (Score:2)
What the hell happened to the CS server..? I can't get anywhere! Bleh, might as well read Slashdot...
Oh, *that's* what happened to the CS server...
Aren't you thinking of Freenet? (Score:2)
--
Thats very sick and twisted. (Score:1)
Gotta be a cool guy if he thought it was funny.
Chris Cothrun
Curator of Chaos
screwed up the link (Score:2)
preview is your friend.
Chris Cothrun
Curator of Chaos
uh, I hate to point out... (Score:2)
The benefits of a compression system in HTTP 1.1 (look elsewhere for my post with links about this) are as much in the reduction of TCP connection creation and the transfer of the images in a page in one big chunk instead of lots of little requests.
Think real hard for a miniute. The few hundred K or less of HTML and images on an average web page being sucked through a 56K modem are going to be much slower than even virtual memory from a swap file! Memory and processor speed are the last of your considerations.
Chris Cothrun
Curator of Chaos
Already an optional part of HTTP 1.1 (Score:3)
Apparantly the improvements span more than just compressing stuff. HTTP 1.1 has provision for maintaining a TCP connection for the duration of the transfer of page and page elements instead of creating a new TCP connection for each page element.
Scroll down about halfway for the tables. A quick glance shows that compression works best for low bandwidth connections (naturally) and that the other improvements also made a difference.
Chris Cothrun
Curator of Chaos
Re:Akamai vs. MIT (Score:3)
Gentle readers of Slashdot, do not let yourselves be deceived by the ravings of these pathological liars in LCS, the rotting remains of a once-great department, the dregs left behind when the real talent left to form Akamai. Read the full story [astral.net] and decide for yourself.
Clearly you are correct. (Score:2)
Patent Lawsuit (Score:2)
Anyone with good thoughts? Is there a justification for the Akamai patent rattling, has their fight with digital island been resolved? We were going to go with them for some caching but pulled out because of their patent position. Would love to find out that has become a moot point.
Why Akamai does and does not use Linux (Score:3)
I know it's been said before, but it's worth saying again -- The way to increase the market share of alternate OSes is not to persuade users to install and use Linux. The way is to persuade users to use open File Formats and Protocols/APIs. Diversification of the OS market place will follow as a natural consequence.
In the example above, when Akamai needed to deliver the open file formats and protocols of the Internet, they had several choices. They decided that Linux best suited thier needs. But when they needed to stream Windows Media, Win2000 was their only realistic choice.
I may be a pessimist... but I fear that WMF is a problem that Open Source cannot overcome. Even if we achieved the tremendous feat of catching up with a patent free CODEC and streaming protocol that is comparable to ASF/WMF, we still would not have success. Big Media thinks OSS is evil -- and MS will pander to Big Media's obsession with total IP control.
I hate to be gloomy, but I think that ASF/WMF is the first viable long-term Internet wedge for MS. I think .NET will be the second, and more are sure to follow.
The future just does'nt look bright for alternate OSes from my POV... But then thats just my opinion... I could be wrong!
Jonathan Weesner
Level D Flight Simulators using Linux from NLX Corporation [nlxcorp.com]. That's my idea of FUN!
Maggs-neto (Score:3)
ACM account required (Score:2)
Wow. (Score:2)
Take that, MIT!
Re:Methods of Caching the Internet (Score:1)
A much simpler and more effective approach are proxy servers run by isp's. They significantly reduce the
Now if only people would use the proxy servers their isp's provide...
Re:This is obviously a hoax (Score:1)
Re:Wow. (Score:1)
In a 211 class some number of years ago, Bruce was not his now lean-mean aikami self, but had quite a stomach... Somebody drew a cute little yoda like, round character, on the board one day, giving sage advice in a comic book baloon saying: "Don't forget to memoize."
He is quite a bit shorter than Guy Blelloch...
Education (Score:1)
already done. (Score:2)
There're tons of companies/groups working on variations of the same idea. To name a few:
swarmcast [swarmcast.com], allcast [allcast.com], etc. So far none of them have taken off. I'll leave it as an exercise to the reader to figure out why.
Re:Why Akamai does and does not use Linux (Score:2)
I have high hopes for Ogg Vorbis [xiph.org].
We /. types like it because it's free. Big Business will like it because they will never have to pay anything to use it. The only people who won't like it will be the ones who want to lock up the music, but in the long run they are doomed to fail.
(Given a choice between paying for music in WMF format and paying for music in a CD format, I will buy the CD every time. I predict that enough other people will do the same to ensure that WMF never takes over the world.)
steveha
Akamai Monitoring System? (Score:1)
Akamai vs. MIT (Score:5)
Akamai shares a block with the MIT Laboratory for Computer Science. Recently, there was a despicable, unprovoked snowball attack on innocent MIT graduate students by Akamai customer care thugs. (Well, okay, there's a little more to the story... :-) But anyway, differences will be settled in a mathematical/theoretical computer science shootout [mit.edu] on the evening of April 3. Should be fun.
Re:Methods of Caching the Internet (Score:2)
Don't include them, or give them a lower priority.
If they don't have the client (I imagined it as a browser plugin, but it could be an OS feature, actually, if it's windows the plugin is an os feature ;) ), then they wouldn't be on the 'list', so to speak.
Thirdly, you would be using the other person's (the hosts') upload bandwidth, and bandwidth is something no one wants to sacrifice.
Yes, but it's upstream bandwidth. How much upstream bandwidth does the average 'net user utilize each day?
Re:Methods of Caching the Internet (Score:2)
Uplink bandwidth is limited, but it's still faster than some sites I've seen slashdotted...
Methods of Caching the Internet (Score:5)
This is completely bass-ackwards. The content that becomes more popular becomes harder to get, even though many, many more copies are made available. If said server sends out these 1000 copies of a file, why can't some of the clients share those 1000 copies?
Potential solutions to this problem can be derrived from systems that have already found a way around it, such as Gnutella [wego.com] and any MCAST implementation.
Gnutella, although its network model has other problems, allieviates the previously mentioned problem by forcing (or suggesting that) all clients cache and share for redistribution any content they download, thus increasing the number of available copies. MCAST, and other streaming technologies, handle the problem by allowing the server to send one copy of the content that can be shared by many clients... this is why we don't have to wait for TV/Radio shows to download.
The problem with universally applying an MCAST-type solution to the internet is that the internet is not like TV and radio: the internet is supposed to be content-on-demand. If you turn on your TV five minutes before a show, you can't start watching it early; simlarily, if you tune in five minutes late you can't start back at the beginning (TVIO users aside). I think many /. readers would go into shock if they could only read slashdot on the hour, every hour. (Sidenote: one potential workaround for really busy sites is to broadcast the data every x number of seconds continuously, that way the data restarts often enough. The problem with this is that users with slower connections won't be able to keep up, and users with faster connections will be limited to whatever the server's streaming at. Also, the server will keep broadcasting regardless of what sort of traffic it gets, clogging up its bandwith).
Gnutella is a much better solution. I'm not going to try to work out the details, but stick with me for the big picture. When a user hits a webpage, even with the current model, all of the content is cached on the local hard drive, or sometimes somewhere in between the user and the server. What if everyone's browser was capable of serving requests for that cached data? This would not be efficient for sites with only a little traffic, but for /.ted sites or CNN and the like, it would work very well. The problem is finding another client that has the data you want cached, this might be resolveable using either peering groups (like routers and gnutella), or using a central server to track it all (like napster). This however gives bad users a chance to replace CNN's banner with their own ads etc, but this could perhaps be worked around with some sort of trust metric system?
Well, there's my two cents, sorry if it's incoherent.
Re:questions? (Score:2)
Even so, you could be right. The overhead shifts from the image download to the DNS. Thus it wouldn't make sense for Joe Homeuser to "akamaize", but it does for Yahoo and CNN simply because there are so many people over a such a diverse area attempting to retrieve their pages.
By the way, the estute will notice that the diagram in that article is wrong. The client contacts the client name server, which then will contact Akamai's name servers. This means that the DNS optimization is the client name server and not the client itself.
Good Akamai interview (Score:2)
Re:Patent Lawsuit (Score:1)
I wouldn't think that this case has any affect on acadamia, because even though the infrastructure was developed in an acedemic envornment, its use by Akamai is anything but; so the lawsuit was not over the acedemic roots of it but rather the commercial use.
--
Re:Methods of Caching the Internet (Score:1)
Firstly, most internet users are still on those slow dialups. Although Cable and DSL are very popular and affordable, they are not yet accesable everywhere, and to the casual internet user they can seem like overkill. The reason why connection speed is important is this; ever download a file from a gnutella client on a slow 56k modem? Now imagine that connection with sevral others on the same line... scary, eh?
Secondly, suppose the to a page was redirected to someone else near you who just viewed it. You would send a request to them, but unless they have some kind of similar client, you're just going to be sitting their aimlessly. In order for something like this to work, it must be made into a standard that would be embedded and distributed into all OS's. If just a few OS's supported the system, it would not reach the point where it could actually take a significant load off a central server.
Thirdly, you would be using the other person's (the hosts') upload bandwidth, and bandwidth is something no one wants to sacrifice.
I could continue, but it just comes down to the fact that this kind of system is suitable for Gnutella for swapping mp3s, but not a global-scale webserving solution.
--
Images are not the only bandwidth hog... (Score:1)
--
Re:Images are not the only bandwidth hog... (Score:1)
It's certainly a good idea, it just needs to be integrated into MS IE and Netscape first.
--
Paper on web hashing used by Akamai (Score:1)
After Maggs checkout Kernighan (Score:1)
Re:Why Akamai does and does not use Linux (Score:1)
Re:Methods of Caching the Internet (Score:1)
The best cachable data are the images, which are minimal to sites that really matters. Now, if only there's support for seperation of content and layout ala XML/XSLT, only the XML have to be reloaded from the server, the XSLT can be cached.
Use your proxies, dammit! (Re:Methods of Caching.. (Score:1)
The problem is, everyone wants to look at the same content at the same time; under the current system, the server has to send out one copy of the data to each client that requests it, so if 1000 clients request it, the server has to send 1000 copies.
Have you set your connection proxy?
If not, you probably should. And everyone out there too: The above is exactly what hierarchical proxy-cache servers were designed to prevent! As the name indicates, these servers will proxy your HTTP request(some other protocols can be used too), and cache the result. When another identical request comes in, it is served directly from the cache instead of contacting the server.
The proxies-cache servers are organized in a hierachical fashion. So when you send a request, it does not matter if it is not currently in your proxy-cache: it may be stored in another cache higher in the hierarchy. The request will be sent upward, and only if it is really found nowhere between you and the target, the target server will be contacted.
The result is: everyone wins!
In the situation you describe, if your 1000 clients are under 50 different ISPs, there would be only 50 requests to the server. And everyone (except for the first connected guy of each ISP) would browse much faster, since they get all the data directly from their ISP.
Note that some ISPs enforce the use of their proxy. That's a little bit radical, but if every ISP did that, the Slashdot effect would be a memory, and the net would be a better place...
Conclusion:
(Check with your ISP what proxy you should use).If you want to know more on proxy-caches, check out the docs of Squid [squid-cache.org], a popular proxy-cache server.
--SOMEBODY SET UP US THE PROXY !
Re:Methods of Caching the Internet (Score:2)
Digital Fountain [digitalfountain.com] seeks to solve this problem.
Re:well... (Score:1)
with 2,000ms pingtimes being quite normal, all but the most sparse websites take quite some time to load completely. again, i'll take the occasional dns round-robin over your sat uplink anyday.
besides, you speak as if satelites don't fail. the fact is they do. and when one does, in the system you outline, you now have a single point of failure ...which is exactly what the internet avoids (unless the source itself is screwed).
oh, and ...i work in that room. =)
My .02,
Re:8000 servers? (Score:1)
My .02,
Re:questions? (Score:1)
Re:Images are not the only bandwidth hog... (Score:1)
I realize this... (Score:1)
questions? (Score:2)
If Akamai is serving images for the websites, doesn't that increase the download time, (albeit not considerably in a theoretical, perfectly stable connection) as the end-user is being "served" from multiple systems.
If I understood the portion of the interview pointing at Akamai correctly, the system is only good for the servers. The end user is making multiple, simultaneous requests for the page from several different servers, this should (technically) bring into account bottlenecks between the systems.
Of course, the practice is used all the time via doubleclick and the other ad agencies, and page time isn't to difficult to contend with (I assume) on a non-broadband connection, but when one introduces advertisements, downloading the images, and getting any server database calls from MULTIPLE servers, the backup is potentially paralyzing...
Re:Images are not the only bandwidth hog... (Score:2)
Nope. (Score:3)
Enjoy
dont shoot the messenger (Score:2)
Sure Akamai does some neat stuff, but so does a company called Edgix [edgix.com], which does it via satellite to an ISP bypassing the need to go through hops upon hops of information. What I found neat about Edgix' technology was (although this post sounds like a marketing ploy) they sell their caching servers which poll the most sought after websites' content then cache it hourly, daily, whatever. Then when someone looks something up, it pulls it directly off of the ISP's server which means faster content delivery.
But you don't see me interviewing their staff in attempts for them to flood an article while masquerading as an interview do you?
Not only that but it does this on a satellite based mechanism which means if Globix, UUNet, Exodus, Level3 all blow up, you'll still get a cached slashdot without routes being broken, and a slew of timeout errors.
Well... At least I got to see where he went to school though, such an informative interview.
Toy truck thieves still at large [antioffline.com]
well... (Score:2)
Different strokes for different folks I guess.
Now that you mention it though, I'd like to see how your solution would fly when on a business trip on an airplane. Oh those telco wires at 30,000 feet, how fast they zoom that data through don't they
Re:This is obviously a hoax (Score:2)
Re:Methods of Caching the Internet (Score:2)
Re:Methods of Caching the Internet (Score:2)
The only way to solve that is to have some way of verifying content, maybe a signature or something, but then you've got to have a third party signing everything. This is all aside from the problem of a publisher needing to modify a web page once released (a big one.)
And of course, uplink bandwidth is very limited on the majority of DSL/Cable systems.
He's a Communist agent (Score:1)
My fellow Americans, such double agents for the Eastern Bloc have pervaded our society, as is evidenced by the Hanssen case, and now this new relevation. We must be eve viligent against the Red Menace, and stand up against the Commie bastards.
Thank you and God bless America.
--
George W. Bush
8000 servers? (Score:1)
500 locations and 8000 servers, that's about 16 servers in each. That doesn't look right. What I have been hearing is that Akamai has 3 servers in cluster.
Here is how live Akamai cluster looks like. [meltzer.org]
Also on Content Delivery Networks conference July 6-7, 2000 Barcelona, Spain [terena.nl] in the study "The Measured Performance of Content Distribution Networks" it was shown that to achieve optimal performace you don't need even a 1000 servers, much less number will do the same job.
So what those "8000" servers are for?
Re:Akamai vs. MIT (Score:1)
-----------------
This is obviously a hoax (Score:3)
The article is an obvious attempt to obscure their real purpose; to establish a world wide tic-tac-toe solving distributed supercomputer.
Re:Wow. (Score:1)