It is the absence of the qualification that requires justification.
Why would the Lisp compiler do well on 'small' examples, like the C compiler does, but underperform it on 'larger' examples? Perhaps there are reasons to expect such behaviour but rather than answer a comp. sci. illiterate like me, if you really believe the authors of those papers have been making such serious errors I think you should write your own paper. And if you haven't got the time or inclination to do that, perhaps you could at least explain your criticisms to these fellows: http://books.google.com/books?id=8Cf16JkKz30C&pg=PA21&lpg=PA21 They appear to have used similar reasoning - toy examples and extrapolation - and it looks like they're embarking on a major project to make a better performing R-like system with Common Lisp. But if Lisp really doesn't scale up well, that could turn out to be a futile ambition and a horrible waste of time of course.
but with excerpts in #3 like: ... I can only guess that ... the performance penalty of Lisp must have been significant.
Our QPX search engine is engineered for speed, speeds that must not be lower than using C and where huge amounts of data must not be bigger than packing them in C structs. Still, QPX is very complicated, and driven by individuals who write large bodies of code. Lisp allows us to define a wide variety of abstractions to manage the complexity, and at the same time we get the speed we want. Once QPX is compiled, one cannot easily tell the machine code from the machine code compiled from C.
(from the other ITA link)
Of course you may have guessed right about ITA's case anyway (and undoubtedly there do exist situations in which Lisp is inadequate for performance reasons - even C is sometimes) but whatever the reasons ITA have for using C and Java as well, there are no similar excerpts in #1 or #2 (or in other examples) and it seems to me they are hardly 'toy' examples.
I am not trying to knock Lisp
Sure - you're just knocking the papers I linked to - and it seems to me that is just as well considering there are at least some anecdotal 'real world' counter-examples to your Lisp performance worry stemming from your (possibly justified) criticism that extrapolation from those papers' findings is, at least logically, invalid. I have always found Lisp easily adequate performance-wise for my own use - but that's just anecdotal too of course.