Background
I need to implement the Cannon Algorithm , which is a parallel algorithm for multiplying matrices square in dimension, and the dimension is divided by the square root of the number of processors. I wrote this code that works fine, but in life, without crashing, it's only half the story. It does not multiply A x B by the new matrix C, right. Here is the code, please help me and ask me to help me decide what I'm doing wrong. As it should be obvious, this is homework.
Code
void shift_left(datatype** mat, int s, int row, int n, int amount) { datatype* temp_buffer = malloc(sizeof(datatype) * n); for(int col = 0; col < n; col++) { datatype temp = mat[row][(col+amount)%s]; temp_buffer[(col+amount)%s] = mat[row][col]; temp_buffer[col] = temp; } memcpy(mat[row], temp_buffer, n); free(temp_buffer); } void shift_up(datatype** mat, int s, int col, int n, int amount) { datatype* temp_buffer = malloc(sizeof(datatype) * n); for(int row = 0; row < n; row++) { datatype temp = mat[(row+amount)%s][col]; temp_buffer[(row+amount)%s] = mat[row][col]; temp_buffer[row] = temp; } memcpy(&mat[0][col], temp_buffer, n); free(temp_buffer); } void cannon_mul(int p_sqrt,datatype** a, datatype** b, datatype** c, int n) { int i = 0, j = 0, k = 0; int s = p_sqrt; for(i = 0; i < (s-1); i++) { shift_left(a, s, i, s-1, i);
What do I consider wrong?
My guess is that the bias is wrong or I miss a substantial part of the algorithm. My original shift function did not use a temporary buffer, so I decided to use a temporary buffer this time, but that did not change the situation. I could show some sampling result if that helps, but the result is not even remotely close to the desired result. The good news is fast :)
results
1.48 0.14 9.47 8.99 8.06 0.06 6.68 1.04 4.44 7.50 7.26 8.87 2.21 6.27 2.12 7.91 0.65 5.24 0.45 4.94 0.47 4.13 1.87 2.25 6.83 1.52 6.41 9.14 9.22 8.91 7.34 2.70 6.78 2.78 3.51 4.95 5.27 0.85 9.51 6.82 0.28 6.73 0.70 8.88 7.14 9.09 2.36 5.38 6.43 9.00 7.13 6.71 6.92 9.81 5.13 9.35 7.50 5.16 4.68 3.62 1.30 6.26 4.55 4.27 0.51 2.23 3.19 8.75 6.57 9.07 7.49 6.41 1.04 7.78 7.16 2.78 2.25 6.23 9.42 0.32 3.21 3.60 2.04 2.93 4.29 3.88 2.78 8.01 4.57 6.47 7.52 3.77 0.63 5.97 7.32 4.90 9.63 4.90 8.46 1.90
multiplying the above matrix, with my code produces the following:
2.20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 50.81 0.00 0.00 0.00 0.00 87.51 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
A sequential program produces this result:
163.41 212.17 144.32 227.10 251.03 205.60 245.63 277.33 368.99 334.11 257.85 230.82 203.60 314.08 246.02 240.12 228.37 197.90 264.38 228.24 234.13 272.10 110.75 294.84 263.16 242.07 209.54 316.13 339.23 260.51 185.33 215.59 192.26 283.31 270.80 208.38 265.08 291.49 312.24 319.73 313.23 301.95 182.04 348.11 283.20 337.49 266.54 284.57 355.28 281.07 293.25 323.29 281.35 393.92 325.24 313.62 313.48 342.95 418.37 401.91 255.88 238.25 122.17 254.52 243.58 204.49 217.69 273.03 314.89 214.45 219.26 239.07 200.18 309.98 262.21 242.68 190.02 245.85 297.96 308.56 209.03 213.11 126.24 266.48 233.88 199.33 193.28 228.92 277.50 202.27 210.31 264.67 227.59 337.79 261.40 250.35 225.77 295.00 331.92 352.17
It is important to note that I show the relevant parts of my program, if you feel that I need to show more, please do this and I will provide more code. Finally, why is the Homework tag missing?
Edit
One person noted that the buffer was small and absent in the amount of a stupid error and is now fixed. I tried, and I get the same result, so apparently the problem has nothing to do with this. Hopefully in 2 days I will be able to discover generosity to seduce someone, at least let me know what the problem is. Its a mistake that I cannot debug, and I understand that I must recognize this algorithm to be zero to zero. I rely on web sources that do very little to increase my understanding.
Edit 2
Tried to use calloc for zero buffer allocation, but it does not change the results. So strange, but thanks for what he said; I forgot that memory does not allocate nulls.
Edit 3
I tried this:
void shift_left(datatype** mat, int s, int row, int n, int amount) { datatype* temp_buffer = calloc(n, sizeof(datatype) * n); for(int col = 0; col < n; col++) { /* temp_buffer[(col+amount)%s] = mat[row][col]; temp_buffer[col] = mat[row][(col+amount)%s]; */ temp_buffer[(col+amount)%s] = 0; temp_buffer[col] = 0; } memcpy(mat[row], temp_buffer, sizeof(datatype) * n); //free(temp_buffer); } void shift_up(datatype** mat, int s, int col, int n, int amount) { datatype* temp_buffer = calloc(n, sizeof(datatype) * n); for(int row = 0; row < n; row++) { /* temp_buffer[(row+amount)%s] = mat[row][col]; temp_buffer[row] = mat[(row+amount)%s][col]; */ temp_buffer[(row+amount)%s] = 0; temp_buffer[row] = 0; } memcpy(&mat[0][col], temp_buffer, sizeof(datatype) * n); free(temp_buffer); }
And surprisingly the same result. When it should print all zero, since I commented on the code and replaced it with zero. My guess memcpy does not work.
Change 4
I confirmed that memcpy is the culprit. But why donβt I know that I am at a dead end, if this helps, the data type is just an alias for the double professor who wrote this for some odd reason, since it really does not help to make the code more readable.
But if I myself resolve this, I will be happy to demonstrate a solution to my problem.