If we're talking about bulk builds, for any language, there is going to be a huge amount of locality of reference that matches well against caches. shared text RO, lots of shared files RO, stack use is localized (RW), process data is relatively localized (RW), and file writeouts are independent. Plus any decent scheduler will recognize the batch-like nature of the compile jobs and use relatively large switch ticks. For a bulk build the scheduler doesn't have to be very smart, it just needs to avoid moving processes around between cpus excessively so and be somewhat HW cache aware.
Data and stack will be different, but one nice thing about bulk builds is that there is a huge amount of sharing of the text (code) space. Here's an example of a bulk build relatively early in its cycle (so the C++ compiles aren't eating 1GB each like they do later in the cycle when the larger packages are being built):
Notice that nothing is blocked on storage accesses. The processes are either in a pure run state or are waiting for a child process to exit.
I've never come close to maxing out the memory BW on an Intel system, at least not with bulk builds. I have maxed out the memory BW on opteron systems but even there one still gets an incremental improvement with more cores.
The real bottleneck for something like the above is not the scheduler or the pegged cpus. The real bottleneck is the operating system which is having to deal with hundreds of fork/exec/run/exit sequences per second and often more than a million VM faults per second (across the whole system)... almost all on shared resources BTW, so it isn't an easy nut for the kernel to crack (think of what it means to the kernel to fork/exec/run/exit something like /bin/sh hundreds of times per second across many cpus all at the same time).
Another big issue for the kernel, for concurrent compiles, is the massive number of shared namecache resources which are getting hit all at once, particularly negative cache hits for files which don't exist (think about compiler include path searches).
These issues tend to trump basic memory BW issues. Memory bandwidth can become an issue, but it will mainly be with jobs which are more memory-centric (access memory more and do less processing / execute fewer instructions per memory access due to the nature of the job). Bulk compiles do not fit into that category.