Gun Algorithm Implementation

Background

I need to implement the Cannon Algorithm , which is a parallel algorithm for multiplying matrices square in dimension, and the dimension is divided by the square root of the number of processors. I wrote this code that works fine, but in life, without crashing, it's only half the story. It does not multiply A x B by the new matrix C, right. Here is the code, please help me and ask me to help me decide what I'm doing wrong. As it should be obvious, this is homework.

Code

void shift_left(datatype** mat, int s, int row, int n, int amount) { datatype* temp_buffer = malloc(sizeof(datatype) * n); for(int col = 0; col < n; col++) { datatype temp = mat[row][(col+amount)%s]; temp_buffer[(col+amount)%s] = mat[row][col]; temp_buffer[col] = temp; } memcpy(mat[row], temp_buffer, n); free(temp_buffer); } void shift_up(datatype** mat, int s, int col, int n, int amount) { datatype* temp_buffer = malloc(sizeof(datatype) * n); for(int row = 0; row < n; row++) { datatype temp = mat[(row+amount)%s][col]; temp_buffer[(row+amount)%s] = mat[row][col]; temp_buffer[row] = temp; } memcpy(&mat[0][col], temp_buffer, n); free(temp_buffer); } void cannon_mul(int p_sqrt,datatype** a, datatype** b, datatype** c, int n) { /* 2D matrices and n^2 sized only!*/ int i = 0, j = 0, k = 0; int s = p_sqrt; for(i = 0; i < (s-1); i++) { shift_left(a, s, i, s-1, i); // Skew matrix a } for (i = 0; i < (s-1); i++) { shift_up(b, s, i, s-1, i); // Skew matrix b } for(k = 0; k < (s-1); k++) { for(i = 0; i < (s-1); i++) { for(j = 0; j < (s-1); j++) { c[i][j] += a[i][j]*b[i][j]; shift_left(a, s, i, s-1, 1); shift_up(b, s, i, s-1, 1); } } } } 

What do I consider wrong?

My guess is that the bias is wrong or I miss a substantial part of the algorithm. My original shift function did not use a temporary buffer, so I decided to use a temporary buffer this time, but that did not change the situation. I could show some sampling result if that helps, but the result is not even remotely close to the desired result. The good news is fast :)

results

 1.48 0.14 9.47 8.99 8.06 0.06 6.68 1.04 4.44 7.50 7.26 8.87 2.21 6.27 2.12 7.91 0.65 5.24 0.45 4.94 0.47 4.13 1.87 2.25 6.83 1.52 6.41 9.14 9.22 8.91 7.34 2.70 6.78 2.78 3.51 4.95 5.27 0.85 9.51 6.82 0.28 6.73 0.70 8.88 7.14 9.09 2.36 5.38 6.43 9.00 7.13 6.71 6.92 9.81 5.13 9.35 7.50 5.16 4.68 3.62 1.30 6.26 4.55 4.27 0.51 2.23 3.19 8.75 6.57 9.07 7.49 6.41 1.04 7.78 7.16 2.78 2.25 6.23 9.42 0.32 3.21 3.60 2.04 2.93 4.29 3.88 2.78 8.01 4.57 6.47 7.52 3.77 0.63 5.97 7.32 4.90 9.63 4.90 8.46 1.90 

multiplying the above matrix, with my code produces the following:

 2.20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 50.81 0.00 0.00 0.00 0.00 87.51 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 

A sequential program produces this result:

 163.41 212.17 144.32 227.10 251.03 205.60 245.63 277.33 368.99 334.11 257.85 230.82 203.60 314.08 246.02 240.12 228.37 197.90 264.38 228.24 234.13 272.10 110.75 294.84 263.16 242.07 209.54 316.13 339.23 260.51 185.33 215.59 192.26 283.31 270.80 208.38 265.08 291.49 312.24 319.73 313.23 301.95 182.04 348.11 283.20 337.49 266.54 284.57 355.28 281.07 293.25 323.29 281.35 393.92 325.24 313.62 313.48 342.95 418.37 401.91 255.88 238.25 122.17 254.52 243.58 204.49 217.69 273.03 314.89 214.45 219.26 239.07 200.18 309.98 262.21 242.68 190.02 245.85 297.96 308.56 209.03 213.11 126.24 266.48 233.88 199.33 193.28 228.92 277.50 202.27 210.31 264.67 227.59 337.79 261.40 250.35 225.77 295.00 331.92 352.17 

It is important to note that I show the relevant parts of my program, if you feel that I need to show more, please do this and I will provide more code. Finally, why is the Homework tag missing?

Edit

One person noted that the buffer was small and absent in the amount of a stupid error and is now fixed. I tried, and I get the same result, so apparently the problem has nothing to do with this. Hopefully in 2 days I will be able to discover generosity to seduce someone, at least let me know what the problem is. Its a mistake that I cannot debug, and I understand that I must recognize this algorithm to be zero to zero. I rely on web sources that do very little to increase my understanding.

Edit 2

Tried to use calloc for zero buffer allocation, but it does not change the results. So strange, but thanks for what he said; I forgot that memory does not allocate nulls.

Edit 3

I tried this:

 void shift_left(datatype** mat, int s, int row, int n, int amount) { datatype* temp_buffer = calloc(n, sizeof(datatype) * n); for(int col = 0; col < n; col++) { /* temp_buffer[(col+amount)%s] = mat[row][col]; temp_buffer[col] = mat[row][(col+amount)%s]; */ temp_buffer[(col+amount)%s] = 0; temp_buffer[col] = 0; } memcpy(mat[row], temp_buffer, sizeof(datatype) * n); //free(temp_buffer); } void shift_up(datatype** mat, int s, int col, int n, int amount) { datatype* temp_buffer = calloc(n, sizeof(datatype) * n); for(int row = 0; row < n; row++) { /* temp_buffer[(row+amount)%s] = mat[row][col]; temp_buffer[row] = mat[(row+amount)%s][col]; */ temp_buffer[(row+amount)%s] = 0; temp_buffer[row] = 0; } memcpy(&mat[0][col], temp_buffer, sizeof(datatype) * n); free(temp_buffer); } 

And surprisingly the same result. When it should print all zero, since I commented on the code and replaced it with zero. My guess memcpy does not work.

Change 4

I confirmed that memcpy is the culprit. But why don’t I know that I am at a dead end, if this helps, the data type is just an alias for the double professor who wrote this for some odd reason, since it really does not help to make the code more readable.

But if I myself resolve this, I will be happy to demonstrate a solution to my problem.

+6
source share
2 answers

The copy is too small.

 // memcpy(mat[row], temp_buffer, n); memcpy(mat[row], temp_buffer, n * sizeof(datatype) ); 

Same for memcpy(&mat[0][col], temp_buffer, n);

+2
source

You have what looks to me like a "one by one" error.

Your cannon_mul has the following loops:

 for(i = 0; i < (s-1); i++) for (i = 0; i < (s-1); i++) for(k = 0; k < (s-1); k++) for(i = 0; i < (s-1); i++) for(j = 0; j < (s-1); j++) 

However, in my opinion, the corresponding source material in http://www.cs.berkeley.edu/~demmel/cs267/lecture11/lecture11.html#link_5 , the pseudocode says:

 for all (i=0 to s-1) for all (i=0 to s-1) for k=0 to s-1 for all (i=0 to s-1, j=0 to s-1) 

which, I am sure, means that the loop variables should take the value s-1 in the last iteration of the loop. Your loop variables never take the value s-1 .

Try changing your loops to <= (s-1) and see if that helps.

0
source

Source: https://habr.com/ru/post/985146/


All Articles