Any efficient way to read data from a large binary?

Question

Any efficient way to read data from a large binary?

I need to process dozens of Gigabytes data in a single binary. Each entry in the data file has a variable length.

So the file looks like this:

<len1><data1><len2><data2>..........<lenN><dataN>

The data contains an integer, pointer, double value, etc.

I found that python cannot handle this situation. No problem if I read the entire file in memory. It is fast. But it seems that the package is structnot suitable for performance. It is almost stuck in decompressing bytes.

Any help is appreciated.

Thanks.

+3

python file binary

limi Aug 17 '09 at 12:40

source share

6 answers

array, , array.fromfile. :

.

. try-except.

+2

SilentGhost 17 . '09 12:44

:

class foo(Structure):
        _fields_ = [("myint", c_uint32)]

bar = foo()

,

block = file.read(sizeof(bar))
memmove(addressof(bar), block, sizeof(bar))

lenN, . . , pack() unpack(), , - .

+2

Michael Foukarakis 17 . '09 12:52

bitstring.

struct bytearray, Bits , .

:

from bitstring import Bits

s = Bits(filename='your_file')
while s.bytepos != s.length:
    # Read a byte and interpret as an unsigned integer
    length = s.read('uint:8')
    # Read 'length' bytes and convert to a Python string
    data = s.read(length*8).bytes
    # Now do whatever you want with the data

, , .

, , , , , , s[-800:] 100 .

+2

Scott Griffiths 17 . '09 15:05

, sqlite3 .

import sqlite3
sqlite3.Connection(":memory:")

sql .

, generators ( ) iterators ( ).

+1

riza 17 . '09 13:44

PyTables - HDF5, , , :

PyTables

It works more or less like a hierarchical database, where you can store several tables inside columns. Look at this.

+1

dalloliogm Aug 17 '09 at 16:42

source share

Alex Martelli · Accepted Answer · 2009-08-17T14:48:12+0000

struct array, , , , . buffer, mmap, ctypes, , . , Cython-encoded , ( C, ++, Fortran,...), .

, , - , , ? , "", , ? , (, ), , "", , , ? ( array - , array !). , " "? Etc ..

(, , , , , 64- - !), , , ! -).

Any efficient way to read data from a large binary?

More articles: