Code duplication optimization

My problem is how best to handle long heavy loops that can take parameters. Consider the following method:

void HeavyLoop(byte* startingAddress, bool secondaryModification) { for (int i = 0; i < 10000000; i++) { byte* b = startingAddress + i; *b+= 1; if (secondaryModification) *b+= 2; } } 

This method will do what I want, but I use 10,000,000 unnecessary if inside the loop.

If I wrote the same method like this:

 void HeavyLoop(byte* startingAddress, bool secondaryModification) { if (secondaryModification) { for (int i = 0; i < 10000000; i++) { byte* b = startingAddress + i; *b+= 1; *b+= 2; } } else { for (int i = 0; i < 10000000; i++) { byte* b = startingAddress + i; *b+= 1; } } } 

I would get the same result, although the entire loop code should be duplicated. It doesn’t really matter if we are talking about one parameter, but when you have 4 independent parameters, I would have to write 16 different versions of the loop.

What is the β€œright” decision in such cases? If it were a language like Python, I could just dynamically build a function to handle the loop. Is there something similar in C ++?

Needless to say, code is just an example, not an actual case. Please do not give solutions related to *b+=1 as such. I am starting C ++, so forgive me if there is a simple solution that I do not know about. I also apologize if there are syntax errors, I do not have a compiler available at the moment.

Edit: The problem is related to operations that can be pre-calculated not outside the loop.

+4
source share
5 answers

You can implement a loop as a template; the template argument is a compile-time constant, so optimization should remove unwanted code when it is false. Then you need a wrapper that allows you to call the right specialization based on the runtime value:

 template <bool secondaryModification> void HeavyLoop(byte* startingAddress) { for (int i = 0; i < 10000000; i++) { byte* b = startingAddress + i; *b+= 1; if (secondaryModification) *b+= 2; } } void HeavyLoop(byte* startingAddress, bool secondaryModification) { if (secondaryModification) { HeavyLoop<true>(startingAddress); } else { HeavyLoop<false>(startingAddress); } } 

During compilation, both versions of the template will be played (one of which contains *b+=2; and the other does not, and does not run the test while the argument is running); they must then be built into the wrapper function to generate exactly the same code as your second example, but without the need to duplicate the source code.

+15
source

EDIT: in order to better orient what the OP requires and still delete it, this post has been carefully edited.

Of course, I assume you profiled this, and it was shown as a hot spot ... right?

In fact, I'm sure you did not. And that you seriously underestimate your compiler.

For example, here is your code compiled using LLVM:

 void f1(char*); void f2(char*); void loop(char* c, int n, int sm) { for (int i = 0; i < n; ++i) { if (sm) f1(c); else f2(c); } } 

What gives:

 define void @loop(i8* %c, i32 %n, i32 %sm) nounwind uwtable { %1 = icmp sgt i32 %n, 0 br i1 %1, label %.lr.ph, label %._crit_edge .lr.ph: ; preds = %0 %2 = icmp eq i32 %sm, 0 br i1 %2, label %3, label %5 ; <label>:3 ; preds = %3, %.lr.ph %i.01.us = phi i32 [ %4, %3 ], [ 0, %.lr.ph ] tail call void @f2(i8* %c) nounwind %4 = add nsw i32 %i.01.us, 1 %exitcond = icmp eq i32 %4, %n br i1 %exitcond, label %._crit_edge, label %3 ; <label>:5 ; preds = %5, %.lr.ph %i.01 = phi i32 [ %6, %5 ], [ 0, %.lr.ph ] tail call void @f1(i8* %c) nounwind %6 = add nsw i32 %i.01, 1 %exitcond2 = icmp eq i32 %6, %n br i1 %exitcond2, label %._crit_edge, label %5 ._crit_edge: ; preds = %5, %3, %0 ret void } 

Even if you don't know IR LLVM, just follow the "sm" variable:

 .lr.ph: ; preds = %0 %2 = icmp eq i32 %sm, 0 br i1 %2, label %3, label %5 

The compiler generated two different loops (starting with <label>:3 and <label>:5 respectively, and select once for the whole loop that will be executed at the beginning of the selection.

This is a pretty well-known compiler trick: Loop Invariant Code Motion (and derivation), so why bother manually? If it's worth it, the compiler will do it!

+10
source

One method for this kind of problem that works in both C and C ++ is to use a built-in function. For C ++, you can simply use the template function (actually the same solution, but a bit more elegant).

Here's the built-in solution for C / C ++:

 inline void HeavyLoop_inline(byte* startingAddress, bool secondaryModification) { for (int i = 0; i < 10000000; i++) { byte* b = startingAddress+ i; *b+= 1; if (secondaryModification) *b+= 2; } } void HeavyLoop(byte* startingAddress, bool secondaryModification) { if (secondaryModification) { HeavyLoop_inline(startingAddress, TRUE); } else { HeavyLoop_inline(startingAddress, FALSE); } } 

The reason this works (and is efficient) is because the secondaryModification value that is passed to the inline function is a compile-time constant, so the compiler is able to optimize any dead code for every call. Then you get two "specialized" versions of the function.


Notes

Depending on the compiler you use, you can take additional steps to ensure that the inline function is inline. For instance. for gcc you can add __attribute__ ((always_inline)) .

Note that some compilers will perform this optimization of re-factoring the loop without any intervention, so first check your generated code before trying to outwit the compiler.

+9
source

Why not:

 void HeavyLoop(byte * startingAddress, bool secondaryModification) { int const incr = secondaryModification ? 3 : 1; for (byte * b = startingAddress, * const end = startingAddress + 10000000; b != end; ++b) { *b += incr; } } 

You can, of course, put whatever you like into the definition of incr .


A madman can even write *(b++) += incr in a loop incrementer. The best way for lovers of C secret syntax would be the following:

 byte * b = startingAddress, * const end = startingAddress + 10000000; while (b != end) { *(b++) += incr; } 
+1
source

EDIT: given the new text in OP "The problem is with operations that cannot be precomputed outside the loop."

If an operator cannot be precomputed outside the loop, then by definition it must be computed in the loop, and anything other than the source code with verification inside the loop simply confuses the code. I assume that since you are asking this question, there is one more detail that you are not showing us.

Original answer:

First, just write it in an obvious way until profiling shows you that it is too slow. But in this case, you can evaluate the expression in front:

 void HeavyLoop(byte* startingAddress, bool secondaryModification) { const int val = 1 + (secondaryModification ? 2 : 0); for (int i = 0; i < 10000000; i++) { byte* b = startingAddress + i; *b += val; } } 
-1
source

Source: https://habr.com/ru/post/955402/


All Articles