Programming to populate our L1 or L2 cache

How can we systematically write code to load data into our L1 or L2 cache?

I'm specifically trying to tune the L1 cache fill of my system for some higher analysis. Any suggestions will be made - regarding writing assembly code or simple programming in C. Related articles on this topic will be even more useful.

+4
source share
3 answers

The cache stores data recently received. To fill the cache, just access the data. Or in this case instructions. Fill the memory block with no-op instructions (and instructions for ending the loop at the end) and go to it.

The hard part stores the data there when it is loaded. You cannot access anything outside the 32K dataset (or any other) while your test is running.

I can’t imagine what you get from artificially filling the cache and then keeping it filled with the same data set, but there you go.

+3
source

You will need to find out the associativity of your processor cache and the replacement policy. I can’t come up with a general solution to this problem that will work on all the processors I worked with. Even caches advertised as completely associative with the LRU replacement policy are not quite what they really are, and it can be very difficult to find a memory access pattern that fills the cache completely.

If you want this to be for some very specific test (which is a bad idea for other reasons), I would recommend that you try to figure out how to clear the cache. It really is doable.

+2
source

I just completed this task last week to populate the ECC cache l1 and l2.

Basically, if you have a 64 KB cache, for example, the total number (x the number of paths, the number of cache lines, etc.), the data simply accesses this data linearly through the cache (mmu may be required to enable caching) ) begin with a border of about 64 Kbytes and, if possible, read 64 Kbyte data, ideally in the form of strings (or multiple). For icache, you need instructions for multiple bytes (nops or add reg + 1 or something else), remember that there is probably a prefetch at the end, so you may need to cancel the final return of several instructions to prefetch you have reached the end (it may take some practice, and if you do not have visibility in the logic (chip sim), you may not understand this.

you can use mmu or other games that your logic might need to reduce the amount of memory required, for example, if you have mmu with a record size that covers say 4Kb, then you can fill 4Kb of real memory with data, then use 16 different mmu records (with 16 different virtual addresses) and for each of the 16 read through 4K. Of course, if your cache is on the virtual side of the mmu address.

In general, this is a kind of ugly thing, if your mmu prevents command caching, you can put the code that runs the test in non-cached space so that it does not get corrupted with icache and only the instructions used to fill the cache are in the cached address space.

Good luck ...

0
source

Source: https://habr.com/ru/post/1485796/


All Articles