Any efficient way to read data from a large binary?

I need to process dozens of Gigabytes data in a single binary. Each entry in the data file has a variable length.

So the file looks like this:

<len1><data1><len2><data2>..........<lenN><dataN>

The data contains an integer, pointer, double value, etc.

I found that python cannot handle this situation. No problem if I read the entire file in memory. It is fast. But it seems that the package is structnot suitable for performance. It is almost stuck in decompressing bytes.

Any help is appreciated.

Thanks.

+3
source share
6 answers

struct array, , , , . buffer, mmap, ctypes, , . , Cython-encoded , ( C, ++, Fortran,...), .

, , - , , ? , "", , ? , (, ), , "", , , ? ( array - , array !). , " "? Etc ..

(, , , , , 64- - !), , , ! -).

+4

array, , array.fromfile. :

.

. try-except.

+2

:

class foo(Structure):
        _fields_ = [("myint", c_uint32)]

bar = foo()

,

block = file.read(sizeof(bar))
memmove(addressof(bar), block, sizeof(bar))

lenN, . . , pack() unpack(), , - .

+2

bitstring.

struct bytearray, Bits , .

:

from bitstring import Bits

s = Bits(filename='your_file')
while s.bytepos != s.length:
    # Read a byte and interpret as an unsigned integer
    length = s.read('uint:8')
    # Read 'length' bytes and convert to a Python string
    data = s.read(length*8).bytes
    # Now do whatever you want with the data

, , .

, , , , , , s[-800:] 100 .

+2

, sqlite3 .

import sqlite3
sqlite3.Connection(":memory:")

sql .

, generators ( ) iterators ( ).

+1

PyTables - HDF5, , , :

It works more or less like a hierarchical database, where you can store several tables inside columns. Look at this.

+1
source

Source: https://habr.com/ru/post/1715325/


All Articles