Compression is all about removing redundancy from data. Unfortunately, it is unlikely that redundancy will be distributed with monotonous uniformity throughout the file, and this is the only scenario in which you can expect compression and fine-grained random access.
However, you can approach random access by maintaining an external list created during compression that shows the correspondence between the selected points in the uncompressed data stream and their locations in the compressed data stream. You obviously need to choose a method in which the translation scheme between the original stream and its compressed version does not depend on the location in the stream (i.e. No LZ77 or LZ78, instead, you probably want to go for Huffman or byte-pair coding .) Obviously, this will entail a lot of overhead, and you will need to decide how you would like to trade between the disk space needed for the “bookmark points” and the processor time needed to decompress the stream, starting from to get the data, which you are really looking for for eniya.
As for a random access record ... it's almost impossible. As already noted, compression involves removing redundancy from data. If you try to replace data that could have been compressed because it was redundant with data that does not have the same redundancy, it just doesn't work.
However, depending on how many random access records you are going to make, you can simulate it by supporting a sparse matrix representing all the data written to the file after compression. In all readings, you check the matrix to see if you read the area you wrote after compression. If not, you will go to the compressed file for the data.
afeldspar Nov 04 '08 at 4:35 2008-11-04 04:35
source share