I am trying to understand how to handle the basic operations of UTF-8 in C ++.
Let's say we have this scenario: A user enters a name, it is limited to 10 letters (characters in the user's language, not bytes), and it is saved.
This can be done in ASCII.
// ASCII char * input; // user input char buf[11] // 10 letters + zero snprintf(buf,11,"%s",input); buf[10]=0; int len= strlen(buf); // return 10 (correct)
Now how to do it in UTF-8? Suppose this is up to 4 bytes (e.g. Chinese).
// UTF-8 char * input; // user input char buf[41] // 10 letters * 4 bytes + zero snprintf(buf,41,"%s",input); //?? makes no sense, it limits by number of bytes not letters int len= strlen(buf); // return number of bytes not letters (incorrect)
Can this be done with standard sprintf / strlen? Are there any replacements for this function for use with UTF-8 (in PHP there was a mb_ prefix for such IIRC functions)? If not, do I need to write them myself? Or maybe I need to approach him differently?
Note. I would prefer to avoid solving wide characters ...
EDIT: Limit it to just the base multilingual language.
Chris source share