Yes, the null byte in UTF8 is code point 0, NUL. There is no other Unicode code point to be encoded in UTF8 with a zero byte anywhere.
Possible code points and their UTF8 encoding:
Range Encoding Binary value ----------------- -------- -------------------------- U+000000-U+00007f 0xxxxxxx 0xxxxxxx U+000080-U+0007ff 110yyyxx 00000yyy xxxxxxxx 10xxxxxx U+000800-U+00ffff 1110yyyy yyyyyyyy xxxxxxxx 10yyyyxx 10xxxxxx U+010000-U+10ffff 11110zzz 000zzzzz yyyyyyyy xxxxxxxx 10zzyyyy 10yyyyxx 10xxxxxx
You can see that all non-zero ASCII characters are represented as themselves, while all mutibyte sequences have a high bit of 1 in all of their bytes.
Perhaps you need to be careful that your ascii plaintext protocol does not handle non-ASCII characters badly (as these will be all points of the code other than ASCII).
paxdiablo Aug 02 2018-11-11T00: 00Z
source share