People keep arguing that
Heck, there might be strings out there that will crash any Unicode library implementation, just we haven't found them yet because the search space is huge.
Let's see how
Nope; the 3 Hanzi characters didn't show at all, and only the à showed correctly in the second name. But both everything looks correct in this second editing widget. This proves that
I see that the "Comment:" edit widget for this message does have the Hanzi and marked 'u' and 'o' characters missing. So the damage is done after you hit the Submit button. There's no excuse for this. None of those characters have any special meaning to the code, and text containing them can't do any damage to anything. If damage happens, it's the fault of the crappy software handling the text, not the fault of the creator of the text. The right thing to do is to correct the crappy software. Damaging the text is simply idiotic, and interferes with the main reason (communication between literate people) that Unicode was invented.
(And we might note that a significant fraction of the users of the Internet now consists of people who communicate via Hanzi text, or Arabic or any of the hundreds of other character sets that humanity uses to communicate. Damaging those folks' texts to avoid fixing your crappy software is a good way to tell them that you don't want them communicating with other people. This is rapidly becoming a commercially untenable position for people trying to "attract eyes" on the Net.