I use mmap / read + BZ2_bzDecompress to sequentially decompress a large file (29 GB). This is because I need to parse uncompressed XML data, but only small pieces are needed, and it seemed that it would be more efficient to do this sequentially than unzip the entire file (400 GB without compression), and then parse it. Interestingly, the decompression part is very slow - while the bzip2 shell command is able to execute bits over 52 MB per second (it used several timeout 10 bzip2 -c -k -d input.bz2 > output
runs timeout 10 bzip2 -c -k -d input.bz2 > output
and divided the produced files by 10), mine the program can not even do 2 MB / s, slowing down after a few seconds to 1.2 MB / s
The file I'm trying to process uses several bz2 streams, so I check BZ2_bzDecompress
for BZ_STREAM_END
, and if that happens, use BZ2_bzDecompressEnd( strm );
and BZ2_bzDecompressInit( strm, 0, 0 )
to restart with the next thread, if the file has not been fully processed. I also tried without BZ2_bzDecompressEnd
, but that didn’t change anything (and I can’t see in the documentation how to handle multiple threads correctly)
The file will be mmap'ed before, where I also tried different combinations of flags, currently MAP_RDONLY
, MAP_PRIVATE
from madvise to MADV_SEQUENTIAL | MADV_WILLNEED | MADV_HUGEPAGE
MADV_SEQUENTIAL | MADV_WILLNEED | MADV_HUGEPAGE
MADV_SEQUENTIAL | MADV_WILLNEED | MADV_HUGEPAGE
(I check the return value, and madvise does not report any problems, and I'm on the linux kernel 3.2x debian installation, which has great support)
When profiling, I was convinced that, in addition to some counters for measuring speed and printing, which were limited only once every n iterations, nothing else was done. This also applies to the modern multi-core server processor, where all the other kernels where it works, and it is bare metal, is not virtualized.
Any ideas on what I can do wrong / do to improve performance?
Update: thanks to the suggestion of James Chong, I tried to "exchange" mmap()
with read()
, and the speed is still the same. So it seems that mmap()
not a problem (either this, or mmap()
and read()
share the main problem)
Update 2: I think that perhaps the malloc / free calls made in bzDecompressInit / bzDecompressEnd will cause me to set the bzalloc / bzfree of the bz_stream structure in a user implementation that only allocates memory for the first time and does not free it if the flag is set ( passed by opaque parameter = strm.opaque). It works great, but the speed does not increase again.
Update 3: I also tried fread () instead of read (), and yet the speed remains the same. We also tried a different number of bytes read and the size of the decompressed buffer data - no change.
Update 4: Reading speed is definitely not a problem, since I managed to achieve a speed of about 120 MB / s in sequential reading using only mmap ().