Since lossless compression works better in some areas than others, if you store compressed data in BLOCKSIZE blocks of convenient length, although each block has exactly the same number of compressed bytes, some compressed blocks will expand to a much longer piece of plaintext than others .
You can look at “Compression: The Key for Next Generation Search Engines” by Nivio Ziviani, Edlen Silva de Mura, Gonzalo Navarro and Ricardo Baeza-Yates in Computer Journal, November 2000 http://doi.ieeecomputersociety.org/10.1109/2.881693
Their decompressor takes 1, 2 or 3 whole bytes of compressed data and decompresses (using a list of words) into a whole word. You can directly search for compressed text for words or phrases, which is even faster than finding uncompressed text.
Their decompressor allows you to specify any word in the text using a regular (byte) pointer and immediately start unpacking from this point.
You can give each word a unique 2-byte code, since you probably have less than 65,000 unique words in the text. (The KJV Bible has nearly 13,000 unique words). Even if there are more than 65,000 words, it is quite simple to assign the first 256 double-byte code words “words” for all possible bytes, so you can spell out words that are not included in the vocabulary of 65,000 or so “the most common words and phrases”. (Compression obtained by packing common words and phrases in two bytes usually costs "sprawling" occasionally spelling out a word using two bytes per letter). There are many ways to choose a vocabulary of “common words and phrases" that will provide adequate compression. For example, you can configure the LZW compressor to dump “phrases”, which it uses several times, into the vocabulary file, one line per phrase and run it on all your data. Or you can arbitrarily split uncompressed data into 5 byte phrases in the lexicon file, one line per phrase. Or you can cut your uncompressed data into real English words and put each word, including a space at the beginning of the word, in the lexicon file. Then use "sort -unique" to eliminate duplicate words in this vocabulary file. (Gathers the perfect “optimal” vocabulary for vocabulary, which is still considered NP-hard?)
Save the vocabulary at the beginning of your huge compressed file, set it aside for some convenient BLOCKSIZE, and then save the compressed text - a series of two byte words - from there to the end of the file. Presumably, the search engine will read this vocabulary once and save it in some quick decoding in RAM during decompression to speed up the decompression of the “double-byte code” to “variable-length phrases”. My first project will start with a simple line on the phrase list, but later you can switch to saving the lexicon in a more concise form using some kind of incremental encoding or zlib.
You can select an arbitrary random byte offset into the compressed text and start unpacking it. I do not think that it is possible to create a compressed format of a compressed file with random access.