At the other end of the design spectrum, Cavium's Thunder X has 48 ARMv8 cores (not hyperthreads) per die, and supports dual-socket configurations for up to 96 processors per board. Individually the cores are weaker than a Xeon, but on some workloads (network routing, some database serving), they're pretty impressive in aggregate. That many physical cores also makes it easier to load balance VMs in a hosted environment. This is especially good for the kind of workload where most clients are idle for a lot of the time, but when they're busy they're very busy.
-Os frankly is of little interest to desktop developers. Heck, I spend quite a bit of my time on 8 bitters these days, and I think you're being pedantic.
You might want to tell Apple that, as they compile everything with -Os. It turns out that instruction cache pressure still matters, and matters a lot more if you're in the kind of environment where multiple applications are competing for space.
and I believe are its biggest contributors
I'm not 100% sure, but I think that Google passed Apple as the largest single contributor (incrementally, at least, not cumulatively) somewhere in the 3.5 to 3.7 time frame. A lot of the Apple compiler team has been busy with Swift.
I'd love to have a company adopt some of these, polish the UI a bit, and provide an Android phone that ships with them by default, instead of the Google stuff.
You might have mail.