To answer my question, Shannon Fano must be “good enough” for such a task. If you've never done anything in the field of data compression, I suggest avoiding Huffman coding (or specialized versions of arithmetic coding).
According to this , SF meets your space / time requirements. I suggest implementing something like first . Pseudocode is defined as follows:
1: begin 2: count source units 3: sort source units to non-decreasing order 4: SF-SplitS 5: output(count of symbols, encoded tree, symbols) 6: write output 7: end 8: 9: procedure SF-Split(S) 10: begin 11: if (|S|>1) then 12: begin 13: divide S to S1 and S2 with about same count of units 14: add 1 to codes in S1 15: add 0 to codes in S2 16: SF-Split(S1) 17: SF-Split(S2) 18: end 19: end
Only if you fully understand SF (or you used similar algorithms earlier), I would suggest moving on to more stringent arithmetic coding methods. I recently implemented SF for the information class theory, and some of its parts seemed unintuitive and strange until I hugged it. On paper, it looks simple, but (like many other algorithms) that can cheat.
If you don’t get extra “style points,” I’ll personally go to Shannon Fano.
source share