Why is my loop slower when I delete the code

When I remove tests to calculate the minimum and maximum from the loop, the execution time is actually longer than when testing. How is this possible?

Edit: After running more tests, it seems that the execution time is not constant, that is, the same code can work after 9 seconds or 13 seconds ... So it was just a duplicate. Repeats until you do enough tests that ...

Some information:

  • runtime with test min max: 9 seconds
  • execution time without test min max: 13 seconds
  • CFLAGS=-Wall -O2 -fPIC -g
  • gcc 4.4.3 32 bit The section to be deleted is now indicated in the code

Some suggest: poor cache interaction?

 void FillFullValues(void) { int i,j,k; double X,Y,Z; double p,q,r,p1,q1,r1; double Ls,as,bs; unsigned long t1, t2; t1 = GET_TICK_COUNT(); MinLs = Minas = Minbs = 1000000.0; MaxLs = Maxas = Maxbs = 0.0; for (i=0;i<256;i++) { for (j=0;j<256;j++) { for (k=0;k<256;k++) { X = 0.4124*CielabValues[i] + 0.3576*CielabValues[j] + 0.1805*CielabValues[k]; Y = 0.2126*CielabValues[i] + 0.7152*CielabValues[j] + 0.0722*CielabValues[k]; Z = 0.0193*CielabValues[i] + 0.1192*CielabValues[j] + 0.9505*CielabValues[k]; p = X * InvXn; q = Y; r = Z * InvZn; if (q>0.008856) { Ls = 116*pow(q,third)-16; } else { Ls = 903.3*q; } if (q<=0.008856) { q1 = 7.787*q+seiz; } else { q1 = pow(q,third); } if (p<=0.008856) { p1 = 7.787*p+seiz; } else { p1 = pow(p,third); } if (r<=0.008856) { r1 = 7.787*r+seiz; } else { r1 = pow(r,third); } as = 500*(p1-q1); bs = 200*(q1-r1); // // cast on short int for reducing array size // FullValuesLs[i][j][k] = (char) (Ls); FullValuesas[i][j][k] = (char) (as); FullValuesbs[i][j][k] = (char) (bs); //// Remove this and get slower code if (MaxLs<Ls) MaxLs = Ls; if ((abs(Ls)<MinLs) && (abs(Ls)>0)) MinLs = Ls; if (Maxas<as) Maxas = as; if ((abs(as)<Minas) && (abs(as)>0)) Minas = as; if (Maxbs<bs) Maxbs = bs; if ((abs(bs)<Minbs) && (abs(bs)>0)) Minbs = bs; //// End of Remove } } } TRACE(_T("LMax = %f LMin = %f\n"),(MaxLs),(MinLs)); TRACE(_T("aMax = %f aMin = %f\n"),(Maxas),(Minas)); TRACE(_T("bMax = %f bMin = %f\n"),(Maxbs),(Minbs)); t2 = GET_TICK_COUNT(); TRACE(_T("WhiteBalance init : %lu ms\n"), t2 - t1); } 
+4
source share
2 answers

I think the compiler is trying to deploy the inner loop because you are removing the dependency between iterations. But for some reason this does not help in your case. Maybe because the cycle is too large and there are too many registers to turn around.

Try turning off the spread again and posting the results.

If so, I suggest you submit a gcc performance issue.

PS. I think you can combine if (q>0.008856) and if (q<=0.008856) .

+2
source

Perhaps this is a cache, perhaps deployment problems, there is only one way to answer this question: look at the generated code (for example, using the -S option). Perhaps you can post it / or spot the difference when comparing them.

EDIT: As you now find out that this is just a measurement that I can only recommend (or better command ;-) you, that when you want to get run-time numbers: ALWAYS put it in some kind of loop and average it. It is best to do this outside of your program (in a shell script), so your cache is no longer filled with the right data.

+1
source

Source: https://habr.com/ru/post/1347933/


All Articles