I just completed this task last week to populate the ECC cache l1 and l2.
Basically, if you have a 64 KB cache, for example, the total number (x the number of paths, the number of cache lines, etc.), the data simply accesses this data linearly through the cache (mmu may be required to enable caching) ) begin with a border of about 64 Kbytes and, if possible, read 64 Kbyte data, ideally in the form of strings (or multiple). For icache, you need instructions for multiple bytes (nops or add reg + 1 or something else), remember that there is probably a prefetch at the end, so you may need to cancel the final return of several instructions to prefetch you have reached the end (it may take some practice, and if you do not have visibility in the logic (chip sim), you may not understand this.
you can use mmu or other games that your logic might need to reduce the amount of memory required, for example, if you have mmu with a record size that covers say 4Kb, then you can fill 4Kb of real memory with data, then use 16 different mmu records (with 16 different virtual addresses) and for each of the 16 read through 4K. Of course, if your cache is on the virtual side of the mmu address.
In general, this is a kind of ugly thing, if your mmu prevents command caching, you can put the code that runs the test in non-cached space so that it does not get corrupted with icache and only the instructions used to fill the cache are in the cached address space.
Good luck ...
source share