It is nearly impossible to find Intel cache specifications. When I taught the cache class last year, I asked friends inside Intel (in the compiler group) and they couldn't find the specifications.
But wait !!! Jed , bless his soul, tells us that on Linux systems you can compress a lot of information from the kernel:
grep . /sys/devices/system/cpu/cpu0/cache/index*/*
This will give you associativity, dial size and a ton of other information (but not latency). For example, I found out that although AMD advertises its L1 128K cache, my AMD computer has a broken I and D cache of 64K each.
Two suggestions that are now mostly obsolete thanks to Jed:
AMD publishes much more information about its caches, so you can get at least some information about the modern cache. For example, last year, AMD L1 caches delivered two words per cycle (peak).
The open source tool valgrind contains all kinds of cache models, and it is invaluable for profiling and understanding the behavior of the cache. It comes with a very nice kcachegrind visualization kcachegrind , which is part of the KDE SDK.
For example: in the third quarter of 2008 AMD K8 / K10 The processors use 64-byte cache lines with 64 KB L1I / L1D cache. L1D is a 2-way associative and exclusive with L2, with a 3-cycle delay. L2 cache has 16-channel associativity and latency is about 12 cycles.
AMD Bulldozer family processors use shared L1 with 16-byte associative L1D for a cluster (2 per core).
Intel processors have supported L1 in the same way for a long time (from Pentium M to
Haswell to Skylake and, presumably, many generations after that): Split 32kB each I and D caches, and L1D is an 8-band associative. 64 bytes corresponding to DDR DRAM packet size. The loading delay is ~ 4 cycles.
Also see the x86 tag wiki for links to better performance and microarchitecture data.
Norman Ramsey Apr 04 '09 at 1:05 2009-04-04 01:05
source share