Late reply: Use something else

Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

Journal divbyzero's Journal: Late reply: Use something else

Journal by divbyzero on Monday January 06, 2003 @01:57AM

This is a late reply (after the thread was archived) to Use something else under Unicode and the Unix Console.

Certainly scanning forward and backward by character is computationally less efficient in UTF-8 than in a fixed-width encoding. However, such scanning is not necessary when searching for tokens. Due to the "no false hits" design of UTF-8, simple byte-for-byte comparisons and scanning will work perfectly. This turns out to be true of nearly all string operations.

The space issue is harder to answer. Whether it was intentional or not, choices made by the Unicode Consortium when it laid out the character table work to bias UTF-8 against CJK, even though those languages are used by a very large percentage of the people in the world.

This discussion has been archived. No new comments can be posted.