Is there any easy way to tell where one grapheme cluster ends, and another begins? With UTF-8, it's easy to count the bits to see where one codepoint begins and ends, I hope there is something equally simple for grapheme clusters. Or perhaps it's all complicated and is different for each language?
As I understand it it comes down to table lookups. The details of full unicode support are unfortunately not trivial and theres a reason libraries like ICU are as big as they are.
Also, if I do accidentally split a grapheme cluster in two (while respecting codepoint boundaries), what will happen? If I attempt to display the two strings, can I expect a sensible result, or will the result be garbage?
As I understand it normally the base character is first and then things added to it follow.
So if you cut the end off a string and cut in the middle of a cluster then the last character may be missing some bits but the string is likely to be otherwise OK.
If you cut the start off a string and cut in the middle of a cluster things get messier. You then have combining characters at the start of the string with nothing to combine with. If you just ask a display library to display it then it's going to be down to the display library what happens but I expect the combiners will either be not displayed at all or displayed with no base. If you add the cut string to the end of another string then the combiners will combine with whatever was at the end of the string you combined it with.
All in all you will probablly end up with something "ugly but usable".