How can I demonstrate cache misses?

Inspired by Meyers I read the computer cache and wanted to do an experiment demonstrating the things mentioned. Here is what I tried:

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    typedef uint8_t data_t;

    const uint64_t max = (uint64_t)1<<30;
    const unsigned cycles = 1000;
    const uint64_t step = 63;  // tried also for 64

    volatile data_t acu = 0;
    volatile data_t *arr = malloc(sizeof(data_t) * max);
    for (uint64_t i = 0; i < max; ++i)
        arr[i] = ~i;

    for(unsigned c = 0; c < cycles; ++c)
        for (uint64_t i = 0; i < max; i += step)
            acu += arr[i];

    printf("%lu\n", max);

    return 0;
}

Anbd, then simple gcc --std=c99 -O0 test.c && time ./a.out. I checked, and my processor cache lines are 64 bytes long. step = 64When assigning , I tried to skip cache misses more often than with step=63.

However, it step=63runs a little faster. I suspect that I am a “victim” of prefetching because my RAM is read sequentially.

How can I improve my example of moving an array to demonstrate the cost of cache misses?

+4
1

step = 63 . , 63 , 63-, 6-, 61-,... . step = 1 ( ) step = 64 ( ) max .

+2

Source: https://habr.com/ru/post/1624479/


All Articles