I would really REALLY suggest you spend a little more time researching those other compressors you so easily consider to be 'text streams', they are not.
for example, one of them also happens to hold the current record for non lossy image compression..
Its all a matter of feeding them the right models, and I can guarantee that a good PPM or CM set of models will do much better than a weeks worth
of model development - but of course they reason they WILL is because they take care of the downstream details - the work you have done in finding
context is exactly what they do need.
Remember, there are three stages to compression, and using 'state deep within a video decoder that doesn't apply to text streams (like what above-neighbor color presence bits are set)' is the top level - finding context to model. What I would suggest is that the decades of research as to how best to utilise that context
could be of use... then again perhaps you have done better than they can - and that is what testing against the corpus will show.
When it comes to non lossy compression, there is no such thing as a text compressor, there is no such thing as an exe compressor, there are just different
models of data, and different ways of using those models.
You are not the first, or I would suspect the last to look at bitstream detokenisation and recompression in its many forms..
If you dont read up on this, you are missing something that matters, for example:
But then perhaps you are aware of that all.
Dont get me wrong, 22% is VERY respectable on jpeg.. but why not try to do better.