A recommendation on how best to process information for random access?

Question

A recommendation on how best to process information for random access?

Imagine you have a file system tree:

root/AA/aadata
root/AA/aafile
root/AA/aatext
root/AB/abinput
root/AB/aboutput
root/AC/acinput
...

Only about 10 million files. Each file is about 10 KB in size. They basically resemble a key-storage with folders divided to improve speed (FS will die if I put 5 million files in one folder).

Now we need:

archive this tree into one large file (it should be relatively fast, but also have a good compression ratio - thus, 7z is too slow)
A quick search for the result of a large file is very fast - therefore, when I need to get the contents of "root / AB / aboutput", I have to read it very quickly.

Redis, , . , SSD , ( HDD).

- , squashfs . EXT3 EXT4 NTFS.

zlib , , , . , , , - - . , - ?

+4

python gzip archive zlib

Spaceman 10 . '14 17:10

1

ErstwhileIII · Answer 1 · 2014-03-10T17:47:07+0000

( ). : ( "", . "" (, 8196). , () , . (, ). .

, . , . , ( ) ; (, ).

A recommendation on how best to process information for random access?

More articles: