if Wine actually renders them itself and sends the resulting bitmaps to the X server, then this will certainly consume more bandwidth than if it just sends the text and font info and lets X render it.
Mod parent very insightful.
Some used to pre-render a larger bitmap full of commonly used letters. For example, a 2kx2k bitmap cut into squares for each letter. As letters are needed, an empty square is found and filled with the pre-rendered letter with all anti-aliasing and such applied. Then the program just tells X11 to copy and blend that square to the destination. The bitmap acts like a cache that doesn't have to constantly take up bandwidth to fully send.
That has worked well except when you get into letters combination that shape differently due to different letters in the combination. For example, the tail of the y may extend further under some scripted letters than others. In others languages besides English, this occurs more often.
The enable technology the article speaks about could help in this area if the X11 server was optimized to handle such bytecode interpretation internally. Then there is no need to fill the bitmap cache with every combination of shaped letters.
What some of these anonymous cowards don't realize is the size of the cache needed to store all possible unicode characters combination with all shapes and styles applied, and then you should realize the cache method has become useless.