We could not choose to not use shared objects without destroying the Firefox extensibility model.
Sure, PGO is supposed to solve that problem. Measurements have shown that GCC's PGO does not actually work that well: it either turns on optimizations too aggressively, increasing codesize and slowing the program, or it doesn't turn on enough optimizations in the code that truly matters.
Mozilla makefiles manually choose special optimization settings for certain files within the JS engine such as jsinterp.cpp (which is compiled with -O3). We're working on being able to support -fomit-frame-pointer but currently the crash reporting system would break if we did that, and at least for the Mozilla binaries crash reporting is more important.
The prelink program does not affect code generation. Declared hidden symbols actually change code generation to avoid PIC loads and GOT lookups.
- The Windows ABI is cheaper: every relocated symbol in Linux is resolved at runtime by loading the PIC register and going a GOT lookup. Windows avoids PIC code by loading the code at a "known" address and relocating it at startup only if it conflicts with another DLL.
- Mozilla code runs fastest when 99% of it is compiled for space savings, not "speed". Because of the sheer amount of code involved in a web browser, most of the code will be "cold". Tests have shown that at least on x86, processor caches perform much better if we compile 99% of our code optimizing for codesize and not raw execution time: this is very different than most compiler benchmarks. The MSVC profile-guided optimization system allows us to optimize that important 1% at very high optimization levels; the GCC profile-guided optimization system only really works within the confines of a particular optimization level such as -Os or -O3. In many cases using PGO with Linux produced much *worse* code!
- The GCC register allocator sucks, at least on register-starved x86: we've examined many cases where GCC does loads and saves that are entirely unnecessary, thus causing slowdowns.
Believe me, we'd really love to make Linux perform as well as Windows! We spent a lot of time in Firefox 3 with libxul reducing startup time by making symbols hidden and reducing the number of runtime relocations...
This is a incorrect. There are optimizations that you can perform in a tracing JIT which you simply cannot perform in a static compiler, even a static compiler with global optimizations on!
For instance, one of the optimizations we will be working on in the Spidermonkey tracer is escape analysis: if, during the course of a traced loop, a value which would normally be heap allocated goes out of scope, you know you can place that value on the heap. Because it's a tracing JIT, this optimization works across multiple methods.
Even standard compiler analyses such as common subexpression elimination and loop invariant hoisting can be much more effective in a tracing JIT than they can in a static compiler or even a per-method JIT: you know the side effects of everything that happens under the trace, and so you can know, across method boundaries, whether a traced loop is pure and whether loop counters are invariant or can be simplified.