The code below should do the trick. First, it opens the file and decodes it in lzma , and then uses a struct to unpack the binary data.
import lzma import struct import pandas as pd def bi5_to_df(filename, fmt): chunk_size = struct.calcsize(fmt) data = [] with lzma.open(filename) as f: while True: chunk = f.read(chunk_size) if chunk: data.append(struct.unpack(fmt, chunk)) else: break df = pd.DataFrame(data) return df
The most important thing is to know the correct format. I googled around and tried to guess and '>3i2f' (or >3I2f ) works pretty well. (This is a large number of endian 3 ints 2. What do you suggest: 'i4f' does not create reasonable floats - regardless of whether it is large or small endian.) For struct syntax and format, see docs .
df = bi5_to_df('13h_ticks.bi5', '>3i2f') df.head() Out[177]: 0 1 2 3 4 0 210 110218 110216 1.87 1.12 1 362 110219 110216 1.00 5.85 2 875 110220 110217 1.00 1.12 3 1408 110220 110218 1.50 1.00 4 1884 110221 110219 3.94 1.00
Update
To compare the output of bi5_to_df with https://github.com/ninety47/dukascopy , I compiled and ran test_read_bi5 from there. The first lines of output:
time, bid, bid_vol, ask, ask_vol 2012-Dec-03 01:00:03.581000, 131.945, 1.5, 131.966, 1.5 2012-Dec-03 01:00:05.142000, 131.943, 1.5, 131.964, 1.5 2012-Dec-03 01:00:05.202000, 131.943, 1.5, 131.964, 2.25 2012-Dec-03 01:00:05.321000, 131.944, 1.5, 131.964, 1.5 2012-Dec-03 01:00:05.441000, 131.944, 1.5, 131.964, 1.5
And bi5_to_df on the same input file gives:
bi5_to_df('01h_ticks.bi5', '>3I2f').head() Out[295]: 0 1 2 3 4 0 3581 131966 131945 1.50 1.5 1 5142 131964 131943 1.50 1.5 2 5202 131964 131943 2.25 1.5 3 5321 131964 131944 1.50 1.5 4 5441 131964 131944 1.50 1.5
So everything seems beautiful (ninety47 code reorders columns).
In addition, it is more likely to use '>3i2f' instead of '>3i2f' (i.e. unsigned int instead of int ).