Cache poisoning error for deep nested loop

I am writing code for the mathematical method (Incomplete Cholesky), and I ended up in a curious checkpoint. See the following simplified code.

for(k=0;k<nosUnknowns;k++) { //Pieces of code for(i=k+1;i<nosUnknowns;i++) { // more code } for(j=k+1;j<nosUnknowns;j++) { for(i=j;i<nosUnknowns;i++) { //Some more code if(xOk && yOk && zOk) { if(xDF == 1 && yDF == 0 && zDF == 0) { for(row=0;row<3;row++) { for(col=0;col<3;col++) { // All 3x3 static arrays This is the line statObj->A1_[row][col] -= localFuncArr[row][col]; } } } } }//Inner loop i ends here }//Inner loop j ends here }//outer loop k ends here 

For context

statObj is an object containing several 3x3 static double arrays. I initialize statObj by calling a new function. Then I populate the arrays inside it using some math functions. One of these arrays is A1_. The value of the variable nosUnknowns is about 3000. The localFuncArr array was previously generated using matrix multiplication and is a double array.

Now this is my problem:

  • When I use a string as shown in the code, the code works very sluggishly. Something like 245 seconds for the whole function.

  • When I comment on the specified line, the code is very fast. It takes about 6 seconds.

  • Now, when I replace the specified line with the following line: localFuncArr[row][col] += 3.0 , again the code runs at the same speed as in case (2) above.

Clearly, something about calling statObj->A1_ makes the code run slowly.

My question (s):

  • Is cache poisoning the reason this happens?

  • If so, what can be changed in terms of array initialization / object initialization / loop unfolding or, for that matter, any form of code optimization that can speed it up?

Any understanding of this from experienced people is much appreciated.

EDIT: Changed the description to be more detailed and to correct some of the points mentioned in the comments.

+6
source share
1 answer

If the conditions are mostly true, your line of code runs 3000x3000x3000x3x3 times. This is about 245 billion times. Depending on your hardware architecture, 245 seconds can be a very reasonable time (1 iteration every 2 cycles - assuming a 2 GHz processor). In any case, there is nothing in the code that says cache poisoning.

+4
source

Source: https://habr.com/ru/post/949694/


All Articles