>> If C and C++ natively did UTF-8
> You mean, what Rust does.
Rust doesn't really do "native" UTF-8 any more than C does. Try getting a substring of characters 5 through 10 of a Rust String not knowing if some of the characters before the tenth are non-ASCII unicode codepoints.
I was a little surprised by how bad it is in that area. I know they're going for "As efficient as C", but cmon man, strings using byte indexing?
There are a few ways to do it. The most common is to use the chars() method, which gives you an iterator over characters. So, for your example, something like "s.chars().skip(5).take(5).collect()". If you really need to do heavy unicode text manipulation (e.g. you're writing a text editor or something), you probably want to use some of the available crates, e.g. unicode-segmentation.
Clearly, as you say, this isn't what a lot of people would consider full, native support for UTF-8. Really doing it right would impose a heavy runtime penalty on the vast majority of simple string usage that doesn't need it, so Rust compromised: If you have a &str or a String in Rust, you know that what it contains is valid UTF-8 -- which means that when you create one you're paying the validation penalty, even if you don't need it... however, the penalties scale in an unsurprising way. When you create a string from bytes, the validation is an O(n) operation, but you also have to copy the bytes, so it's already O(n). When you slice a string, the slice validation only has to check the first and last characters of the slice, so it's O(1), as you would expect slicing to be. You might not naively expect slicing to panic with a UTF-8 validation error, but you should expect that it might panic with a bounds-checking error so the fact that it might panic isn't surprising. And, of course, you can use the get() method to get Err() instead of a panic.
Full native UTF-8 support would be a lot heavier. Many common String operations would be O(n) rather than O(1) -- including indexing! The APIs would be quite confusing to people accustomed to C-style strings, too, another cost. So, Rust doesn't do that. Instead, if you want the length of a string in Unicode characters, you use s.chars().count(). If you want a substring with character offsets you use s.chars().skip(n).take(m).collect(), or similar. These operations do not look like they're O(1) which is good, because they're not. They're also not nearly as slow/heavy as they look.
Like most compromises, this one makes no one really happy, and many people will disagree that it's the right choice. But I don't really see a better option, do you? Keeping in mind that everything from device drivers and bare-metal microcontroller code to browsers and editors is included in the target space, and that having different wide and narrow string types has proven to be a bad idea.