You need to come up with a heuristic that will cause a 100% (or very close) cache miss (I hope you have a cache rejection code?) And a 100% cache. Hooray, which works for level 1 cache. Now, how to do the same for level 2 and 3?
In all seriousness, it is probably not possible to do this 100% reliably without special equipment and traces associated with the processor and memory, but here is what I will do:
Write a “bunch” of material in 1 place in the memory - enough so you can be sure that it sequentially puts the L1 cache and records the time (which affects your cache, so be careful). You must make this set of records without branches to try to get rid of inconsistencies in the branch prediction. This is the best time. Now, every so often, write cache line data to a random remote place in RAM at the end of your known location and record a new time. Hope this takes longer. Continue to do this recording at different times, and I hope you will see a couple of timings that are usually grouped. Each of these groups "can" show timings for L2, L3 and memory access timings. The problem is that there are many other things that get in the way. The OS can switch you context and ruin the cache. An interruption may come and turn off after your time. There will be many things that could throw away values. But I hope you get enough signal in your data to find out if it works.
This will probably be easier to do in a simpler, built-in type system, where the OS (if any) will not bother you.
source share