Java loop and offset

In short, I'm trying to create Huffman codes from a canonical Huffman list. Essentially, the following two loops should be executed and generate a binary string. Code:

for (int i = 1; i <= 17; i++) { for (int j = 0; j < input.length; j++) { if (input[j] == i) { result.put(allocateCode(i, j), j); //update a hashmap huffCode += (1 << (17 - i)); //Update the huffman code } } } 

In fact, the code should look for all codes of length 1 and generate a Huffman code for each. So, for example, lengths 1 must go (in order): 0, 1. And the lengths of three will go 100, 101, 110.

The allocateCode function simply returns a string that shows the result, the first run creates this:

 Huffman code for code 2 is: 0 (0) length was: 1 Huffman code for code 6 is: 10 (2) length was: 2 Huffman code for code 0 is: 1100 (12) length was: 4 Huffman code for code 3 is: 1101 (13) length was: 4 Huffman code for code 4 is: 1110 (14) length was: 4 Huffman code for code 7 is: 11110 (30) length was: 5 Huffman code for code 1 is: 111110 (62) length was: 6 Huffman code for code 5 is: 111111 (63) length was: 6 

This is correct, and the correct Huffman codes have been created. However, running it on a second array of lengths creates the following:

 Huffman code for code 1 is: 0 (0) length was: 1 Huffman code for code 4 is: 1 (1) length was: 1 Huffman code for code 8 is: 100 (4) length was: 3 Huffman code for code 9 is: 100 (4) length was: 3 Huffman code for code 13 is: 101 (5) length was: 3 Huffman code for code 16 is: 1011000 (88) length was: 7 Huffman code for code 10 is: 10110001 (177) length was: 8 Huffman code for code 2 is: 101100011 (355) length was: 9 Huffman code for code 3 is: 101100011 (355) length was: 9 Huffman code for code 0 is: 1011001000 (712) length was: 10 Huffman code for code 5 is: 1011001000 (712) length was: 10 Huffman code for code 6 is: 1011001001 (713) length was: 10 Huffman code for code 7 is: 10110010011 (1427) length was: 11 Huffman code for code 14 is: 10110010011 (1427) length was: 11 Huffman code for code 17 is: 10110010100 (1428) length was: 11 Huffman code for code 19 is: 10110010100 (1428) length was: 11 Huffman code for code 18 is: 101100101010000 (22864) length was: 15 

As you can see, the same code is generated several times, examples are code 8 and 9, and codes 2 and 3.

I think my problem is nested loops, however I cannot understand why it works fine for one run and does not work for another.

I could just skip something obvious, but I don't see it for a search.

Any advice is appreciated.

thanks

UPDATE

Coming back through my code, it seems that I really made a small mistake when reading the data in the first place, so I got the wrong Huffman codes !!

+4
source share
1 answer

The first two codes in your second example are one in length, which leaves no other possible codes after these first two. All prefix patterns were used.

Your code should contain the number of available remaining codes to detect erroneous input. Just decrease the counter for each code and double the number of samples each time you go to the next length greater than the current one. (Make sure that you double even if there are no such codes, for example, if you switch from codes of length 3 to codes of length 5, double the number for codes of length 4, even if they are not.) Run the account twice for the length of one code.

If this score is ever negative, you have a mistake and you can stop right there. It is not possible to assign codes to this set of lengths.

If at the end of the process the counter is not zero, then you have an incomplete code. This may or may not be a bug depending on your application. This means that the code is not optimal, and fewer bits could be used to encode these characters.

+1
source

Source: https://habr.com/ru/post/1480326/


All Articles