The essence comes down to two things. Neither is particularly complicated in principle, although getting it right can be a bit fiddly.
1) Detect attacking IPs.
HTTP Flood DDOS bots aren't (at least not yet) smart enough to look and behave EXACTLY like people using web browsers. They do wierd things like load web pages repeatedly while never loading images/running javascript/loading CSS stylesheets. They make sequential requests from the same IP address - but with different user agents. They might load a web page that uses cookies - but never return the cookies that are set. Or they might return a cookie - but from a different source address or with a different user agent. They might send user agents that haven't been in widespread use in half a decade. They might not set the 'referer' header, or some other header that a browser DOES set correctly. They probably don't follow HTTP redirects. What you are looking for is any behavior that distinguishes the good traffic and the bad traffic.
So I 'tailed' the web server log and analyzed it in one to ten minute chunks to detect abnormal accesses. All detected addresses were added to a persistent database of blacklisted addresses.
2) Add the detected attacking addresses to an efficient firewall.
A naive firewall blacklist might try to just put each addresses in one big long list. This doesn't scale well beyond a couple of hundred attacking addresses. On the older machine I had, I used a 'divide and conquer' approach: I created a few hundred filter chains based on a /n subnet division of the attacking ip addresses. I then wrote a set of rules that divided incoming traffic into those chains based on the /n they were a member of. That made the number of rules required to filter n attacking IP addresses scale as about O(log n). If I had had a more recent kernel I could have used a hashed map of addresses to take that down to O(1).
After that it became a slow game of cat and mouse. The attacker would alter his attack to try and slip by the detection, I would update the detection software to detect something else he wasn't getting perfect if he managed to by-pass the filters. After about two weeks they quit attacking the web server.
The largest issue I had really was that I was starting my defense from a 'standing start': I had to write all the needed scripts from scratch while the attack was still on going.