This is basically the approach that most container systems use. A scratch space is mounted on top of the various container objects that is a partition on LVM. Interacting with the file system will only impact your locally allocated space.
Docker may be like jail() in a way, but true linux cgroups/namespaces are far more powerful. For one, they can be set on individual processes (including threads). So you can create a thread which has a different view of the filesystem than say the main thread. Sure, the attack vector exists to share information between them but now you can basically make one more hop for an attacker. You can make threads which have no network access, or make a thread which has no access to the process list on a system.
So picture using this with a web browser. You can make that crappy module run in a process which has no network access, a root file system that is empty (/var/empty or some such) and can not see any of the other processes on the system. Its only access to the outside world is through a SOCKS proxy passed in as a file descriptor. Even better this can be done with minimal system calls and no setup from the end user so you don't need any of the real infrastructure that jails require. Just recently they added user namespaces as well so uid "0" in a namespace isn't uid 0 on the host OS.
I love that you can harden a web server by having all the threads accept a "resolver" thread have no network access, and have all the threads except a logging thread have no file system access (or limited file system access), while also limiting the resolver thread to say 50M of memory, the main processing thread to 80% CPU and 12G of memory, and the logging thread to 10% CPU and 10k file system operations per second.. etc.
The per thread aspect of the whole setup is way cool, but the zero administrative overhead for a large chunk of it is even cooler. =)
This is a bit of a naive explanation.
Let me explain how a DDoS mitigation strategy works for many of the companies listed in the summary. They setup datacenters in 10, 15, or more places all hosting a proxy. Some of these solutions use DNS to route traffic around problems (GSLB) while others like CloudFlare use Anycast which is awesome and super hard to get right. Each of these services are typically setup with tons of bandwidth capacity, well over 10Gb, but often times into the 100Gb range. They also often have deals with upstream providers that can filter traffic at the edges meaning it never makes it onto the internet in the first place.
Since you servers are not exposed to the internet, and the ones are are have far, far more horsepower to deal with this than a DDoS will even manage from the client side they can easily just churn through the attack, discarding connections and never letting them hit your limited servers. This is how they can easily survive Anonymous style DDoS attacks.
The other thing is to ensure you have turned of every "feature" your load balancer is giving you. SSL termination at the LB, full session management, etc. All of these cost load balancer CPU which is easy to take advantage of, even if there is a DDoS mitigation system in front of your site. You can't just add a few more servers either. Adding capacity to a load balancer is nearly impossible to do mid-attack.
Even more interesting is that you can often times trick the crappy ddos software by doing things like excessively slow responses (tarpitting) making its loop take ages to try again. This is pretty much using the tactics of a DDoS directly against the attackers.
Another common tactic is to add attackers to a view in your bind config that resolves your hostname to 127.0.0.1 just for them. This works if you do not have long TTLs and they are using hostnames. If they are using direct IPs then you simply move your traffic to a second IP and drop the one they are attacking. Best case is if you can do this via BGP announcements so the traffic simply will fail to route and everybody wins.
And yes, I do this professionally but not for any commercial product.
Speaking as somebody that has done hardware qualifications and burn-in development at very large scale for companies you ahve heard of let me tell you the tools I use:
fio: The _BEST_ tool for raw drive performance and burnin testing. A couple of hours of random access will ensure the drive head can dance, then a full block by block walk through with checksum verification will ensure that all blocks are readable and writable.. I usually do 2 or 3 passes here. You can tell fio to reject drives that do not perform to a minimum standard. Very useful for finding functional yet not quite up to speed drives. The statistics produced here are awesome as well.. Something like 70 stats per device per test.
stressapptest: This is google's burn in tool and virtually the only one I have ever found that supports NUMA on modern dual socket machines. This is IMPORTANT as its easy to ignore issues that come up with the link between the CPUs. The various testing modes give you the ability to tear the machine to pieces which is awesome. Stressapptest also is the most power hungry test I have ever seen, including the intel Power testing suite that you have to jump through hoops to get.
Pair this with a pass of memtest and you get a really, really nice burn in system that can burtalize the hardware and give you scriptable systems for detecting failure.
The map does not appear to actually mark the areas of the country where it is completely impossible to setup service. In Idaho, where I grew up, there are huge tracts of government property with restrictions and limitations that make it impossible to have cell service, let alone 3G.
Craters of the Moon is one of the largest exposed lava rock flats in the world. If you go to Google maps and search for "idaho", you will see a huge black spot in the bottom right. The flow is actually much larger than that and its all one big preserve. Its impossible to run underground cables since its all basically solid rock, and running overhead wires is pretty damn challenging as well given the lack of roads.
The Frank Church wilderness area which makes up a large chunk of the middle of the state specifically bans wires and electricity, cell towers, wheels, and pretty much any other modern technology. There is no way it will have 3G coverage any time soon.
Montana has the Bob Marshal wilderness area, Wyoming has Yellowstone, California has Yosemite, etc.
Hell, even the south western part of Idaho is just a big flat desert with virtually no farms, roads, or people. Why should we worry about its 3g coverage?
Joe Stump wrote a post that is a perfect response to this insanity.
Why is it that all the people working at scale seems to be going with NoSQL solutions? Are all the devs at Google, Facebook, Twitter, Digg, Redit, etc total idiots or in fact is there a problem that they face that is actually real?
Anybody that sites Amazon, Walmart or any large retailer as an example of why SQL scales is missing the point. Retails have very few write operations compared to the read load. The vast majority of the load hits databases that serve reads and have a high tolerance for write latency. This is a field SQL is good at solving.
On the other hand, social sites that have massive cross user data ties and constant write updates where latency is very important don't fit this model that well. Sure, you can remove SQL replication from the mix, use independent instances of MySQL serving fractions of the overall site, with redundancy between them but if you do that you have functionally built a NoSQL data store. The concept isn't to get right of SQL, its to get rid of the relational aspect of data storage. You can no longer rely on all your data being available to a single SQL statement.
Being an operations guy though I should point out the number one failing of SQL in my world. If you assume that, on average, a machine will either crash or have some sort of hardware failure once a year and you consider a site with 1,000 machines then you see that nearly 3 machines will die every day. Even if you count on 2 years of continuous uptime that is over 1 a day. with 10,000 machines your failure rate is 27 per day, 100,000 machines is 273. This means that any database layer that requires a large number of machines has to build in a recovery layer. Clients need to know that a node is down, when it comes back it needs to have data uploaded to it.. etc. The NoSQL solutions like Cassandra manage this automatically. Trying to do this with MySQL becomes really complicated and you end up implementing all the same logic and constraints in NoSQL solutions anyways. I have seen this happen twice now.
We all agree on the necessity of compromise. We just can't agree on when it's necessary to compromise. -- Larry Wall