Fast binary data conversion in Python

What is the fastest way to convert a binary string of data to a numeric value in Python?

I am using struct.unpack_from() , but I am struct.unpack_from() performance limit.

Context: The input stream is mixed binary and ASCII data. ASCII data conversion is done in C, but ctypes. Implementing decompression in C ctypes gave similar performance for decompression. I guess the overhead is too much. I was hoping to find my own method of C-like compulsion (albeit non-Pythonic). Most likely, all this code will need to be moved to C.

The stream is in byte order of the network (big-endian), and the machine is not very similar. Conversion Example:

 import struct network_stream = struct.pack('>I', 0x12345678) (converted_int,) = struct.unpack_from('>I', network_stream, 0) 

I'm less concerned with handling stream format than the general case of binary conversion, and if there is even an alternative to unpack . For example, socket.ntohl() requires an int, and int() will not convert a binary string of data.

Thanks for your suggestions!

+6
source share
2 answers

The speed problem probably does not arise in the implementation of struct.unpack_from() , but in everything else Python must search the dictionary, create objects, call functions and other tasks. You can speed unpack_from up a bit by eliminating one of these dictionary queries by importing unpack_from directly, rather than getting it from the struct module each time:

 $ python -m timeit -s "import struct; network_stream = struct.pack('>I', 0x12345678)" "(converted_int,) = struct.unpack_from('>I', network_stream, 0)" 1000000 loops, best of 3: 0.277 usec per loop $ python -m timeit -s "import struct; from struct import unpack_from; network_stream = struct.pack('>I', 0x12345678)" "(converted_int,) = unpack_from('>I', network_stream, 0)" 1000000 loops, best of 3: 0.258 usec per loop 

However, if you need a lot of parsing logic, which requires unpacking one number at a time and does not allow you to unpack the whole array of data in bulk, it does not matter what you call you for this. You probably need to do this entire inner loop in a language with less overhead, like C.

+2
source

Based on my experience, you are correct that the code will need to be moved to C. When you discovered performance for various tools for binary conversion ( struct and ctypes , for example, have roughly the same performance.

Cython is the easiest way to get the generated C extension for Python.

Another simple approach is to abandon CPython entirely in favor of pypy , which can generate high-quality, low-level code using its JIT trace.

A more complex but more direct approach is to write a simple C extension. It's not fun, but it's not difficult.

+2
source

Source: https://habr.com/ru/post/901579/


All Articles