FYI stack allocation (the optimisation you refer to) is implemented in the JVM for some time already. It is capable of eliminating large numbers of allocations entirely on hot paths. Of course, there is a lot of memory overhead to all of this - the JVM has to do an escape analysis and it has to keep around bookkeeping data to let it unoptimize things.
For some reason they call this optimisation scalar replacement. I'm not sure why. In theory this can help close the gap a lot, because a big part of the reason GC is seen as slow is just because the languages that use it put so much pressure on the heap due to their library and language designs encouraging tons of tiny objects. If you can put them onto the stack then things can get much faster. I use some pretty large and complicated Java apps these days (like IntelliJ) and they seem to perform well, so perhaps things like this have turned the tide somewhat.