Changed Java UTF-8 Strings in Python

I am interacting with a Java application through Python. I need to be able to create sequences of bytes that contain utf-8 strings. Java uses a modified utf-8 encoding in DataInputStream.readUTF (), which is not supported by python ( at least not yet )

Can someone point me in the right direction for creating java-modified utf-8 strings in python?

Update # 1: To see a little more about the modified java utf-8, check the readUTF method from the DataInput interface on line 550 here or here in the Java SE docs .

Update # 2: I'm trying to interact with a third-party JBoss web application that uses this modified utf8 format to read in strings through POST requests by calling DataInputStream.readUTF (sorry for any confusion regarding the normal operation of the java utf8 string).

Thanks in advance.

+3
source share
4 answers

You can ignore UTF-8 Modified Encoding (MUTF-8) and just treat it as UTF-8. On the Python side, you can just handle it,

  • Converting a string to regular UTF-8 and storing bytes in a buffer.
  • Write the length of the buffer to 2 bytes (not the length of the string) as binary in big-endian.
  • Write the entire buffer.

PHP, Java ( , Java 5).

MUTF-8 JNI . UTF-8 - U + 0000. UTF-8 1 (0x00), MUTF-8 2 (0xC0 0x80). , U + 0000 ( ) . -, DataInputStream.readUTF() , .

EDIT: Python :

def writeUTF(data, str):
    utf8 = str.encode('utf-8')
    length = len(utf8)
    data.append(struct.pack('!H', length))
    format = '!' + str(length) + 's'
    data.append(struct.pack(format, utf8))
+4

, DataInput.readUTF, , ( ) Python.

, . , , , , , . Python, , , , , , . , UTF-8 .

+1

, , ,

I found an implementation of this modified utf8 in openjdk sources and translated it into python. here is the link for the object I created.

0
source

Source: https://habr.com/ru/post/1717127/


All Articles