annotation
Hi, suppose you have two different independent 64-bit binary matrices A and T ( T is a transposed version of itself, using the transposed version of the matrix allows you to work on rows T rather than columns that are super-cool during multiplication for binary arithmetic) and you want to multiply these matrices, only that the matrix multiplication result is truncated to 64 bits, and if you yield value larger than the value of 1 in a specific cell of the matrix, the resulting cell matrix is soda press 1 otherwise 0
Example
AT 00000001 01111101 01010100 01100101 10010111 00010100 10110000 00011000 <-- This matrix is transposed 11000100 00111110 10000011 10101111 11110101 11000100 10100000 01100010
Binary and Traditional Multiplication Results:
Binary Traditional 11000100 11000100 11111111 32212121 11111111 32213421 11111111 21112211 11101111 22101231 11001111 11001311 11111111 54213432 11001111 11001211
Question
How do you multiply these matrices as described above in most effective questions?
PS
I tried to use binary and (i.e., the & operator) instead of doing multiplication by individual bits, in which case I had to prepare the data for multiplication:
ulong u; u = T & 0xFF; u = (u << 00) + (u << 08) + (u << 16) + (u << 24) + (u << 32) + (u << 40) + (u << 48) + (u << 56);
Now, executing binary and over two integers A and u , this will lead to the following:
A u RC 00000001 01111101 00000001 1 01010100 01111101 01010100 3 10010111 01111101 00010101 3 10110000 01111101 00110000 2 11000100 01111101 01000100 2 10000011 01111101 00000001 1 11110101 01111101 01110101 5 10100000 01111101 00100000 1
In the above example, R contains the result of multiplying bits of A by u , and to get the final value, we must sum all the bits in the string. Note that column C contains values equal to the values found in the first column of the resulting multiplication of the Traditional matrix above. The problem is that during this step I have to work with individual bits, which, in my opinion, are a suboptimal approach, I read http://graphics.stanford.edu/~seander/bithacks.html looking for a way to do this in parallel but no luck if anyone knows how to “flatten” and “combine” the values located in column R , the result is a 64-bit matrix, I would appreciate if you drop me a few rows,
Thanks,
Edit
With great thanks to David Eisenstat, the final algorithm will look like this:
var A = ...; var T = ...; // T == transpose(t), t is original matrix, algorithm works with transposed matrix var D = 0x8040201008040201UL; U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56); U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56); U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56); U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56); U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56); U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56); U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56); U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D);
The following code snippet:
public static void Main (string[] args){ ulong U; var Random = new Xor128 (); var timer = DateTime.Now; var A = Random.As<IUniformRandom<UInt64>>().Evaluate(); var T = Random.As<IUniformRandom<UInt64>>().Evaluate(); var steps = 10000000; for (var i = 0; i < steps; i++) { ulong r = 0; var d = 0x8040201008040201UL; U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56); U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56); U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56); U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56); U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56); U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56); U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56); U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); } Console.WriteLine (DateTime.Now - timer); var m1 = new Int32[8,8]; var m2 = new Int32[8,8]; var m3 = new Int32[8,8]; for (int row = 0; row < 8; row++) { for (int col = 0; col < 8; col++) { m1 [row, col] = Random.As<IUniformRandom<Int32>> ().Evaluate(0, 1); m2 [row, col] = Random.As<IUniformRandom<Int32>> ().Evaluate(0, 1); m3 [row, col] = Random.As<IUniformRandom<Int32>> ().Evaluate(0, 1); } } timer = DateTime.Now; for (int i = 0; i < steps; i++) { for (int row = 0; row < 8; row++) { for (int col = 0; col < 8; col++) { var sum = 0; for (int temp = 0; temp < 8; temp++) { sum += m1 [row, temp] * m2 [temp, row]; } m3 [row, col] = sum; } } } Console.WriteLine (DateTime.Now - timer); }
Shows me the following results:
00:00:02.4035870 00:00:57.5147150
And it's a 23x performance improvement on Mac OS X / Mono, thanks everyone