Your attempts to optimize the cycle using some design (including manual cutting and pasting the code) to optimize the speed of the cycle is not recommended. Do not do this; most likely, it will βnot optimizeβ the execution speed.
In any C ++ implementation I've ever come across (MSVC 6.0, 2003, 2005, 2010, GCC, different versions, Diab different versions), absolutely zero, sorry, I did not emphasize that enough, ZERO, time, associated with the allocation of a loop count variable, assuming that any other variables have been allocated to the function in which the loop count variable is allocated. For a simple loop that does not call function calls, the loop count variable may not even bring it out of memory; It can be stored completely in one CPU register throughout its entire life cycle. Even if it is stored in memory, it will be in the execution stack, and space for it (and any other local variables) will be required immediately in one operation, which takes more or less time, depending on the number of variables allocated in the stack. Local variables, such as a loop counter variable, are allocated on the stack, and stack allocations are CHEAP CHEAP CHEAP, as opposed to heap allocations.
An example of the distribution of loop counter variables on the stack:
for (int i=0; i<50; ++i) { .... }
Another distribution of loop counter variables on the stack:
int i = 0; for (; i<50; ++i) { .... }
An example of a loop counter variable allocated on the heap (do not do this, this is stupid):
int* ip = new int; for (*ip=0; *ip<50; ++(*ip)) { .... } delete ip;
Now, to solve the problem of trying to optimize a loop by manually copying and pasting instead of using a loop and counter:
What you plan to do is a manual form of unrolling the loop. Loop deployment is an optimization that compilers sometimes use to reduce the overhead associated with a loop. Compilers can only do this if the number of iterations of the loop can be known at compile time (i.e., the number of iterations is a constant, even if the constant includes a calculation based on other constants). In some cases, the compiler may determine that it is worth expanding the loop, but often it does not expand it completely. For example, in your example, the compiler can determine what would be the advantage of speed to deploy a loop from 50 iterations to only 10 iterations with 5 copies of the loop body. There will still be a loop variable, but instead of 50 loop counter comparisons, now the code should only execute 10 times. This is a compromise, because 5 copies of the body of the loop eat up 5 times more space in the cache, which means that loading these additional copies of the same instructions causes the cache to evict (throw) that many instructions that are already in the cache and which might have wanted to stay in the cache. In addition, loading these 4 additional copies of the loop body instructions from the main memory takes much more time than just grabbing already loaded instructions from the cache when the loop does not expand at all.
Thus, in general, it is often more beneficial to use only one copy of the loop body and go ahead and leave the loop logic in place. (1.e do not do a loop reversal at all.)