Maybe SSE helps algorithm A much more than it does algorithm B. Or B outperforms A on AMD, but not on Intel. Or maybe it is strongly dependent on size of source (there is an implicit assumption that all algorithms scale linearly with size of source; maybe in actual fact some are not linear and others are).
In real life, for some compression jobs you don't CARE how long it takes, and for other jobs you care very much. Or imagine an algorithm that compresses half as fast but decompresses 1000 times faster. That doesn't even register in the score.
The things you mention have always been left as an exercise for the reader.
What benchmark isn't tagged with qualifiers that explain what it does and doesn't mean?
Marketing literature in computing has always been littered with metrics that are completely useless unless you know how to interpret them in the context of what you want to be doing.