Journal divbyzero's Journal: Late reply: Use something else
This is a late reply (after the thread was archived) to Use something else under Unicode and the Unix Console.
Certainly scanning forward and backward by character is computationally less efficient in UTF-8 than in a fixed-width encoding. However, such scanning is not necessary when searching for tokens. Due to the "no false hits" design of UTF-8, simple byte-for-byte comparisons and scanning will work perfectly. This turns out to be true of nearly all string operations.
The space issue is harder to answer. Whether it was intentional or not, choices made by the Unicode Consortium when it laid out the character table work to bias UTF-8 against CJK, even though those languages are used by a very large percentage of the people in the world.
Late reply: Use something else More Login
Late reply: Use something else
Slashdot Top Deals