Optimization is possible only if the body of the loop does not belong to the loop variable. In this case, if the lower bound of the loop is zero, then the compiler will change the loop.
If the loop variable never refers to the loop body, then the compiler is justified in the implementation of the loop, but like it. All that needs to be done is to execute the body of the cycle as many times as specified by the boundaries of the cycle. In fact, the compiler would be perfectly justified in optimizing the scrolling of the loop variable.
Consider this program:
{$APPTYPE CONSOLE} procedure Test1; var i: Integer; begin for i := 0 to 11 do Writeln(0); end; procedure Test2; var i: Integer; begin for i := 0 to 11 do Writeln(i); end; begin Test1; Test2; end.
The body of Test1 compiled into this code by XE7, a 32-bit Windows compiler, with release options:
Project1.dpr.9: for i: = 0 to 11 do
00405249 BB0C000000 mov ebx, $ 0000000c
Project1.dpr.10: Writeln (0);
0040524E A114784000 mov eax, [$ 00407814]
00405253 33D2 xor edx, edx
00405255 E8FAE4FFFF call @ Write0Long
0040525A E8D5E7FFFF call @WriteLn
0040525F E800DBFFFF call @_IOTest
Project1.dpr.9: for i: = 0 to 11 do
00405264 4B dec ebx
00405265 75E7 jnz $ 0040524e
The compiler starts the loop down, which is evident from the use of dec . Note that the loop completion test is performed using jnz without the need of << 24>. This is because dec performs an implicit comparison with zero.
The documentation for dec states the following:
Flags Affected
CF flag is not affected. The flags OF, SF, ZF, AF and PF are set according to the result.
The ZF flag is set if and only if the result of the dec command is zero. And ZF determines if jnz .
Code emitted for Test2 :
Project1.dpr.17: for i: = 0 to 11 do
0040526D 33DB xor ebx, ebx
Project1.dpr.18: Writeln (i);
0040526F A114784000 mov eax, [$ 00407814]
00405274 8BD3 mov edx, ebx
00405276 E8D9E4FFFF call @ Write0Long
0040527B E8B4E7FFFF call @WriteLn
00405280 E8DFDAFFFF call @_IOTest
00405 285 43 inc ebx
Project1.dpr.17: for i: = 0 to 11 do
00405286 83FB0C cmp ebx, $ 0c
00405289 75E4 jnz $ 0040526f
Notice that the loop variable is increasing, and now we have an additional cmp command that runs on each iteration of the loop.
It may be interesting to note that the 64-bit Windows compiler does not include this optimization. For Test1 it produces this:
Project1.dpr.9: for i: = 0 to 11 do
00000000004083A5 4833DB xor rbx, rbx
Project1.dpr.10: Writeln (0);
00000000004083A8 488B0D01220000 mov rcx, [rel $ 00002201]
00000000004083AF 4833D2 xor rdx, rdx
00000000004083B2 E839C3FFFF call @ Write0Long
00000000004083B7 4889C1 mov rcx, rax
00000000004083BA E851C7FFFF call @WriteLn
00000000004083BF E86CB4FFFF call @_IOTest
00000000004083C4 83C301 add ebx, $ 01
Project1.dpr.9: for i: = 0 to 11 do
00000000004083C7 83FB0C cmp ebx, $ 0c
00000000004083CA 75DC jnz Test1 + $ 8
I am not sure why this optimization is not implemented in a 64-bit compiler. I suppose that optimization has a negligible effect in real cases, and the designers decided not to waste energy on its implementation for the 64-bit compiler.