You'd distribute the base model (100GB to 5TB) first since it's useful for a lot of inputs. Then you'd download files that have the transformer plus the compressed data.
The texts they're testing with are 1GB sections of English language Wikipedia dumps, and they're getting poor compression until they use very large transformers, like 100MB.
They also have some weird comparisons for partially decoded data and I don't get the point of it. Like, they show intermediate output of gzip that's basically text with the Burroughs-Wheeler transformation applied and point out that that's not very comprehensible compared to some "equivalent" from their compression system, and it's like, no shit. So this is pretty useless.