About adaptive mode for L1 cache in hyperthread

Recently, I am participating in a Hyper-threading study. I'm a little confused by this feature - L1 data caching context mode.

The architecture optimization guide described that the L1 cache can operate in two modes:

  • The first level cache can work in two modes depending on the bit of the context ID:

    • General mode: L1 data cache is completely shared by two logical processors.

    • Adaptive mode: in adaptive mode, memory access using the page directory is mapped identically between logical processors sharing the L1 data cache.

However, I wonder how the cache is partitioned in adaptive mode as described.

+4
source share
2 answers

In Intel arch, a value of 1 from the L1 Context ID indicates that the L1 data caching mode can be set both in adaptive mode and in shared mode, and a value of 0 indicates that this function is not supported. For more information, check the definition of IA32_MISC_ENABLE MSR Bit 24 (L1 data cache context mode).

According to Intel® 64 and IA-32 Developer's Guide: Vol. 3A (chapter 11 / Cache Control), which I quote below:

  • Sharing mode

In shared mode, the L1 data cache competes between logical processors. This is true even if logical processors use identical CR3 registers and swap modes. In shared mode, the linear addresses in the L1 data cache can be smoothed, which means that one linear address in the cache can point to different physical locations. The anti-aliasing mechanism can lead to overflow. For this reason, IA32_MISC_ENABLE [bit 24] = 0 is the preferred configuration for processors based on Intel NetBurst microarchitecture that supports Intel Hyper-Threading Technology.

  • Adaptive mode

Adaptive mode makes it easy to share the L1 data cache between logical processors. When operating in adaptive mode, the L1 data cache is shared by logical processors in one core if:

• CR3 control registers for logical processors sharing a cache are identical.

• The same swap mode is used by logical processors sharing a cache.

In this situation, the entire L1 data cache is available for each logical processor (instead of competing jointly). If CR3 values ​​are different for logical processors sharing the L1 data cache, or if logical processors use different swap modes, the processors compete for cache resources. This reduces the effective cache size for each logical processor. Cache merging is not allowed (which prevents data overflow).

I just think there is no definite approach for splitting the L1 data cache.

+3
source

The document simply states that if you use adaptive mode and if CR3 or paging mode is different between the cores, the cache is not shared, and the cores "compete" for the cache. It does not show how the partition works.

The easiest way to implement this is to statically reserve half the data caching methods for each processor. This essentially assigns half the data cache statically to each processor.

Alternatively, they can add an extra bit to the virtual tag of each cache line to distinguish which processor the line belongs to. This will allow the use of a dynamic cache partition. This is better suited to describe "competing" for the cache than a static partition.

If you really need to know, you can create some micro tests to make sure that these schemes are actually used.

+2
source

Source: https://habr.com/ru/post/1432099/


All Articles