You can ignore UTF-8 Modified Encoding (MUTF-8) and just treat it as UTF-8. On the Python side, you can just handle it,
- Converting a string to regular UTF-8 and storing bytes in a buffer.
- Write the length of the buffer to 2 bytes (not the length of the string) as binary in big-endian.
- Write the entire buffer.
PHP, Java ( , Java 5).
MUTF-8 JNI . UTF-8 - U + 0000. UTF-8 1 (0x00), MUTF-8 2 (0xC0 0x80). , U + 0000 ( ) . -, DataInputStream.readUTF() , .
EDIT: Python :
def writeUTF(data, str):
utf8 = str.encode('utf-8')
length = len(utf8)
data.append(struct.pack('!H', length))
format = '!' + str(length) + 's'
data.append(struct.pack(format, utf8))