The algorithm in the solution is an exhaustive depth search with pair optimization. Unfortunately, the description does not quite explain this.
An exhaustive search means that we are trying to list all possible combinations of bits. The first step means that we first try to set all the bits to one, then set the last to zero, then to the second, last and second and last, etc.
The first optimization is a rollback as soon as we find that parity has not yet been determined. So, for example, when we begin our search and reach the first line, we check whether this line has zero parity. If this does not happen, we will not continue. We stop, go back and try to set the last bit in the line to zero.
The second optimization is DP-like, because we cache partial results and reuse them. This exploits the fact that, from the point of view of the problem, different search paths can converge to the same logical state. What is the state of a logical search? The description in the decision begins to explain it ("begins", being a keyword). In fact, the trick is that at any given search point, the minimum number of additional bit-flips does not depend on the exact state of the entire sudoku board, but only on the state of the various parities that we need to track (See Further explanation below.) There are 27 parities that we track (taking into account 9 columns, 9 rows and 9 3x3 boxes). Moreover, we can optimize some of them. The parity for all higher lines, as we perform the search, will always be even, and the parity of all lower lines that have not yet been affected by the search does not change. We track only the parity of 1 line. By the same logic, the ratio of cells above and below is ignored, and we only need to track the "active" 3 fields.
Therefore, instead of 2 ^ 9 * 2 ^ 9 * 2 ^ 9 = 134,217,728 states, we have only 2 ^ 9 * 2 ^ 1 * 2 ^ 3 = 8192 states. Unfortunately, we need a separate cache for each depth level in the search. So, we multiply by 81 possible search depths to find that we need an array of size 663 552. Borrow from templatetypedef:
int memo[9][9][1<<9][1<<3][2]; ^ ^ ^ ^ ^ | | | | | row --+ | | | | col -----+ | | | column parity ---+ | | box parity ----------+ | current row parity---------+ 1<<9 simply means 2^9, given how integers and bit shifts work.
Further explanation: due to the way parity works, a flip bit will always flip its 3 corresponding parities. Therefore, all permutations of sudoku boards that have the same parity can be solved with the same bit-flip win pattern. The "solve" function gives an answer to the problem: "Assuming that you can only execute a flip bit, starting at the cell at (x, y), what is the minimum number of bit flip to get the board allowed." All sudoku tips with the same parities will give the same answer. The search algorithm considers many permutations of the boards. He begins to change them from above, counts how many bits turned him over, and then asks the βsolveβ function to find out how much more is needed. If "solve" is already called with the same values ββ(x, y) and with the same parities, we can simply return the cache result.
The confusing part is the code that actually performs the search and update:
/* Try setting the cell to 1. */ ref = !A[r][c] + solve(r, c + 1, mc ^ 1 << c, mb ^ 1 << c / 3, !p); /* Try setting the cell to 0. */ ref = min(ref, A[r][c] + solve(r, c + 1, mc, mb, p));
It could be more clearly displayed as:
bool areWeFlipping = A[r][c] == 1; int nAdditionalFlipsIfCellIs0 = (areWeFlipping ? 1 : 0) + solve(r, c + 1, mc, mb, p); // Continue the search areWeFlipping = A[r][c] == 0; // At the start, we assume the sudoku board is all zeroes, and therefore the column parity is all even. With each additional cell, we update the column parity with the value of tha cell. In this case, we assume it to be 1. int newMc = mc ^ (1 << c); // Update the parity of column c. ^ (1 << c) means "flip the bit denoting the parity of column c" int newMb = mb ^ (1 << (c / 3)); // Update the parity of 'active' box (c/3) (ie, if we're in column 5, we're in box 1) int newP = p ^ 1; // Update the current row parity int nAdditionalFlipsIfCellIs1 = (areWeFlipping ? 1 : 0) + solve(r, c + 1, newMc, newMb, newP); // Continue the search ref = min( nAdditionalFlipsIfCellIs0, nAdditionalFlipsIfCellIs1 );
Personally, I would implement the two sides of the search as βflipβ and βdo not flipβ. This makes the algorithm more understandable, conceptually. This would make the second paragraph a chapter: βFirst, depth means that we first try not to flip any bits, and then flip the last, then the second-last, then the last, and the second-last, etc.β In addition, before we start the search, we need to pre-calculate the values ββof "mc", "mb" and "p" for our board, instead of passing 0.
int nAdditionalFlipsIfDontFlip = 0 + solve(r, c + 1, mc, mb, p); int newMc = mc ^ (1 << c); int newMb = mb ^ (1 << (c / 3)); int newP = p ^ 1; int nAdditionalFlipsIfFlip = 1 + solve(r, c + 1, newMc, newMb, newP); ref = min( nAdditionalFlipsIfDontFlip, nAdditionalFlipsIfFlip );
However, this change does not affect performance.
UPDATE
The most amazing thing is that the key to algorithmic speed is that the memoization array ends up quite rare. At each depth level, there are usually 512 (sometimes 256 or 128) states (out of 8192). Moreover, it is always one state per column. Brown and ordinary parities don't seem to matter! Omitting them from the memoization array improves performance by another 30 times. But can we prove that this is always true?