This is a late reply (after the thread was archived) to lay off the bong under Multithreading and recursion cannot be mixed in C?!.
I'd love to be corrected or enlightened if you have something constructive to say. Personal attacks don't really add anything to a discussion, especially if they're anonymous. I really would like to find a solution to the problem if it's not inherent to C.
Certainly scanning forward and backward by character is computationally less efficient in UTF-8 than in a fixed-width encoding. However, such scanning is not necessary when searching for tokens. Due to the "no false hits" design of UTF-8, simple byte-for-byte comparisons and scanning will work perfectly. This turns out to be true of nearly all string operations.
The space issue is harder to answer. Whether it was intentional or not, choices made by the Unicode Consortium when it laid out the character table work to bias UTF-8 against CJK, even though those languages are used by a very large percentage of the people in the world.
I just learned that you cannot safely mix recursion and multithreading when writing C programs. Here's why:
Most decent C compilers will attempt to do tail recursion elimination, although it is not required the way it is in Scheme. However, there are plenty of recursive algorithms, especially those which deal with trees, which are not tail recursive, and therefore will require the use of the stack.
Due to performance considerations, the C stack must be contiguous; it cannot be broken up into dynamically allocated chunks. If this was not the case, every non-local variable access, not to mention every call and every return, would require extra comparisons. Thus standard compilers keep it contiguous.
Now in the old, single-threaded world, you have only one stack per process. It starts at one end of the total virtual address space, the heap starts at the other end, and they grow towards one another. When they meet, which does not necessarily have to be in the exact middle, you've run out of memory, and no amount of paging will help (after all, this is the virtual address space we're talking about, not the physical one); refine your algorithm, or go out and buy a 64-bit machine.
In the new, multithreaded world, each thread needs to have its own stack. Because each one must be contiguous, and there are only two total "ends" of the linear virtual address space, there is no way to lay it out such that the stacks grow freely until the virtual address space is exhausted. Instead, at thread-creation time, you pre-allocate a certain size stack, and that's all you get for it, even if much of the address space is still free. Thus, because recursive algorithms make heavy use of the stack, they cannot be used on the tiny, artificially limited stacks of the multithreaded world.
What I wonder is if anybody has ever tried to write a preprocessor for C in which you could mark certain functions as recursive (or auto-detect it if you're feeling clever), and replace the use of the real stack with a heap-based, dynamically grown stack. You wouldn't want to do this for all functions, since it would inherently be less efficient, but it would remove the necessity of contiguous stacks and therefore solve the problem. I don't know if I have the patience to write this myself; one side of me appreciates the challenge, but another side says that there are lots of nice languages, Scheme and Java prominent among them, which don't require contiguous stacks, and are very happy mixing recursion and multithreading.
You can not win the game, and you are not allowed to stop playing. -- The Third Law Of Thermodynamics