It's not that NSString itself is broken, it's that the fact that 99.99% of the time an NSString is one 16-bit code unit per glyph that apps using it rarely test the use case where it's two code units per glyph. So a person goes in and writes an app that inserts a new character at a particular byte offset and it works 99.99% of the time, but if it happens to get stuck in the middle of a multi-code-unit glyph, the program breaks.
The documentation is no help. First off, it lies:
Conceptually, a CFString object represents an array of Unicode characters (UniChar) along with a count of the number of characters. The [Unicode] standard defines a universal, uniform encoding scheme that is 16 bits per character.
As we all should know, that's simply not true. Unfortunately, a lot of people don't know better. Unicode is not a universal, uniform encoding scheme that is 16 bits per character. Even UTF-16 isn't that.
A string object presents itself as an array of Unicode characters . You can determine how many characters a string object contains with the length method and can retrieve a specific character with the characterAtIndex: method. These two “primitive” methods provide basic access to a string object.
characterAtIndex returns a 16-bit integer. So obviously it has no way to actually represent wider unicode characters. The length method is not the number of characters on the screen, but the number of code units, which is different, but highly misleading to programmers. They're, again, the same thing 99.99% of the time, but those rare cases where they're not generally slip through testing. And this is why UTF-16 is such a hazardous encoding to use.
Yes, NSString is old. And that's part of the problem. It was made at a time where many thought that unicode was only going to be 16 bits. It hasn't aged well. And it's caused a lot of bugs over its time. And now I'd bet that it or something similar has created a brand new iPhone-equivalent of Winnuke.
Programmers really need two types of strings, and only two, for the lion's share of tasks. One, binary strings, where a char is always 8 bytes and operations can be optimized to heck and back. And two, unicode strings, where a char always represents a whole unicode character that you would display, and the count of characters represents the count of display characters and so forth. None of this "99.99% of the time it's one thing, but every so often it's another...". That's asking for bugs.