Why is the cycle faster the second time?

Question

Why is the cycle faster the second time?

I initially compared the performance of inline D arrays and simple pointers, but I had another problem. For some reason, if I run two identical loops one after the other, the second always runs faster.

Here is the code:

import std.stdio : writeln;
import std.datetime : StopWatch;
import core.stdc.stdlib : malloc, free;

void main()
{
    immutable N = 1_000_000_000;
    StopWatch sw;

    uint* ptr = cast(uint*)malloc(uint.sizeof * N);

    sw.start();
    for (uint i = 0; i < N; ++i)
        ptr[i] = 1;
    sw.stop();
    writeln("the first for loop time: ", sw.peek().msecs(), " msecs");
    sw.reset();

    sw.start();
    for (uint i = 0; i < N; ++i)
        ptr[i] = 2;
    sw.stop();
    writeln("the second for loop time: ", sw.peek().msecs(), " msecs");
    sw.reset();

    free(ptr);
}

After compiling and starting with, dmd -release -O -noboundscheck -inline test.d -of=test && ./testit prints:

the first for loop time: 1253 msecs
the second for loop time: 357 msecs

I was not sure if this was due to D or dmd, so I rewrote this code in C ++:

#include <iostream>
#include <chrono>

int main()
{
    const unsigned int N = 1000000000;

    unsigned int* ptr = (unsigned int*)malloc(sizeof(unsigned int) * N);

    auto start = std::chrono::high_resolution_clock::now();
    for (uint i = 0; i < N; ++i)
        ptr[i] = 1;
    auto finish = std::chrono::high_resolution_clock::now();
    auto milliseconds = std::chrono::duration_cast<std::chrono::milliseconds>(finish-start);
    std::cout << "the first for loop time: " << milliseconds.count() << " msecs" << std::endl;

    start = std::chrono::high_resolution_clock::now();
    for (uint i = 0; i < N; ++i)
        ptr[i] = 2;
    finish = std::chrono::high_resolution_clock::now();
    milliseconds = std::chrono::duration_cast<std::chrono::milliseconds>(finish-start);
    std::cout << "the second for loop time: " << milliseconds.count() << " msecs" << std::endl;

    free(ptr);
}

and g++ -O3 test.cpp -o test && ./testgives a similar conclusion:

the first for loop time: 1029 msecs
the second for loop time: 349 msecs

. . , . , , .

, , ?

+4

c++ performance for-loop d

Betelgeyser 29 . '18 13:02

1

Kozzi11 · Accepted Answer · 2018-03-29T15:21:51+0000

uint* ptr = cast(uint*)malloc(uint.sizeof * N); , . :

import core.stdc.stdlib : malloc, free;

void main()
{
    immutable N = 1_000_000_000;
    uint* ptr = cast(uint*)malloc(uint.sizeof * N);

    foreach (_; 0 .. 100)
    for (uint i = 0; i < N; ++i)
        ptr[N-1] = 1;

    // until this point almost no memory is allocated
    for (uint i = 0; i < N; ++i)
        ptr[i] = 2;

    free(ptr);
}

@Eljay

Why is the cycle faster the second time?

More articles: