Except that with multiprocess concurrency (i.e. non-multithreaded Apache on Unix), you actually gain in a NUMA setup like the Opterons have been from day 1. See, in the optimum case in a NUMA environment, the server process that handles a request gets an entire memory bus for itself. That's far more scalability than with multithreading in the absence of memory duplication, which AFAIR Linux doesn't implement on a per-thread basis in the same address space.
This is why Opterons practically own the 4-socket x86 space: unlike with Intel's older "hub-style" busses, on a NUMA system aggregate memory bandwidth goes up as sockets are added because the number of memory busses increases also.