How strlen counts unicode in c

I'm curious how strlen counts Unicode characters of several bytes in C.

Does each byte or character (how can they consist of several bytes) calculate up to the first '\ 0'?

+5
source share
2 answers

strlen() counts the number of bytes until \0 encountered. This is true for all rows.

For Unicode, note that the return value of strlen() may be affected by a possible existing byte \0 in a valid character other than a null terminator. If UTF-8 is used, this is excellent, because no valid character other than ASCII 0 can contain \0 bytes, but this may not be true for other encodings.

+6
source

strlen is only applicable to strings, which are arrays with zero termination of char . All multibyte encodings allowed inside strings have the property that they do not contain internal null bytes, so strlen and other str functions such as strcat work fine.

If by "unicode" you mean wchar_t arrays, then this may contain zero bytes, but here it is not a problem, none of the wchar_t elements will be equal to zero. And you should not apply str functions to such arrays, they are not defined for them.

+2
source

Source: https://habr.com/ru/post/1207509/


All Articles