Organizing files in tar bz2 file using python

I have about 200,000 text files that fit in a bz2 file. The problem is that when scanning a bz2 file to extract the necessary data, this happens very slowly. It should look at the whole bz2 file to clear the single file that I am looking for. Is there any way to speed this up?

Also, I was thinking about the possibility of organizing files in tar.bz2 so that I could instead know where to look. Is there a way to organize files placed in bz2?

Additional Information / Edit: I need to request a compressed file for each text file. Is there a better compression method that supports such a large number of files and is so carefully compressed?

+3
source share
2 answers

Do you need to use bzip2? Reading his documentation , he is very clear that he is not intended to support random access. Perhaps you should use a compression format that more closely matches your requirements. The good old Zip format supports random access, but it can certainly compress worse.

+6
source

Bzip2 ( , , 900 KiB). , tar , , . , , Zip- ( zlib, bzip2). tar (), .

, tar , ( , Python tar, ). , , , .

0

Source: https://habr.com/ru/post/1759896/


All Articles