The boundaries of practical compression and Kolmogorov complexity

Question

The boundaries of practical compression and Kolmogorov complexity

I consider the use of compression as a way to measure the relationship of a document to the body of documents. However, I found a strange result when using bzip2; len (compress (corpus))> len (compress (corpus + new_document)). Should this be the case with a practical compression algorithm, and theoretically this is possible if you look at the complexity of Kolmogorov's data? (the idea is to use a compression algorithm to approximate data complexity)

+3

algorithm complexity-theory compression

Vegard Jan 13 '11 at 10:48

source share

2 answers

, .

, , , .

, : x y, x - y. , ,

x = "asdfasdfasdfasdfasdfasdfasdfasdfasdf"

y = "asdfasdfasdfasdfasdfasdfasdfasdfasdf23452345234523452344523452452345234524345234"

, D - y. (I.e. K (y) = | D |)

x | ", D 46 " |, D, ( ).

x, , K (x) <= K (y) + log (| y | - | x |)

, , .

(N.b.: RLE , RLE Turing, .)

+1

biziclop 13 . '11 21:05

Nick Fortescue · Accepted Answer · 2011-01-13T16:55:39+0000

Yes, this should be the case with a practical compression algorithm, and theoretically possible with Kolmogorov complexity . The easiest way to explain why this is an example.

Assume the following:

your document separator ,
corpus are documents abc, def, abc, def, abc, def, abc,
new document - def
your compression algorithm (or the kolmogorov description language) simply allows you to repeat using a prefix with repeat count followed by |(this is a run-length encoding option )

Then:

compress (corpus) = "3 | abc, def, 1 | abc"
compress (corpus + new_document) = "4 | abc, def,"

, compress(corpus) compress(corpus+new_document). , , , , . , bzip2, , .

, . , , , , , .

The boundaries of practical compression and Kolmogorov complexity

More articles: