HT is symmetric (in terms of basic resources, system mode may be asymmetric).
So, if HT is enabled, large physical core resources will be shared between the two threads. Some additional hardware is included to maintain the state of both threads. Both threads have symmetric access to the physical core.
There is a difference between an HT-disabled kernel and an HT-enabled kernel; but there is no difference between the first half of the core with HT support and the second half of the core with HT support.
At one point in time, one HT thread can use more resources than others, but this resource balancing is dynamic. The processor will balance the threads as much as possible, and as it wants both threads to use the same resource. You can only do rep nop or pause in one thread to allow the CPU to give more resources to another thread.
I want to find out this information and use set_affinity () to bind a process to a thread with a hyperthread or a thread without a hyperthread to profile its performance.
Well, you can really measure performance without knowing the fact. Just execute the profile when the only thread in the system is tied to CPU0; and repeat it when it is attached to CPU1. I think that the results will be almost the same (the OS can generate noise if it associates some interrupts with CPU0, so try to reduce the number of interrupts during testing and try to use CPU2 and CPU3, if any).
PS
Agner (he is the x86 Guru) recommends using even kernels in case you do not want to use HT, but it is included in the BIOS:
If hyperthreading is detected, block the process to use only logical processors with an even number. This will make one of the two threads in each processor core inactive so that there is no competition for resources.
PPS On the new HT reincarnation (not P4, but Nehalem and Sandy) - based on Agner microarchitecture research
The new bottlenecks that require attention in Sandy Bridge are: ... 5. Resource sharing between threads. Many of the critical resources are between two threads of the kernel when hyperthreading is enabled. It may be wise to turn off overthreading when multiple threads are dependent on the same execution resources.
...
A peninsular solution was introduced at NetBurst and again at Nehalem and Sandy Bridge with what is called hyperthreading. A hyperthreading processor has two logical processors using the same execution core. The advantage of this is limited if two threads compete for the same resources, but hyperthreading can be very beneficial if performance is limited by something else, such as memory access.
...
Both Intel and AMD are developing hybrid solutions in which some or all of the executive devices are distributed between two processor cores (hyperthreading in Intel terminology).
PPPS: The Intel Optimization Book lists resources for sharing resources in the second generation of HT: (page 93, this list is for nehalem, but there is no change to this list in the Sandy section)
Deeper buffering and advanced resource / partition access policies:
- - Replicated resource for working with HT: registration status, renamed return stack buffer, large-page ITLB // comment from me: there are two sets of this HW
- - Shared resources for HT: loading buffers, storage buffers, re-ordering buffers, ITLBs with small pages are statically distributed between two logical processors. // Comment from me: there is one set of this HW; it is statically split between two HT virtual cores in two halves
- - Competitive shared resource when running HT: backup station, cache hierarchy, fill buffers, both DTLB0 and STLB.//comment: A single set, but not divided in half. The CPU will dynamically reallocate resources.
- - Interleaving during HT operation: front-end operations are usually interleaved between two logical processors to ensure fairness. // comment: there is one Frontend (command decoder), so the streams will be decoded in the order: 1, 2, 1, 2.
- - HT unknown resources: execution units. // comment: there are real hw-devices that will perform calculations, memory accesses. There is only one set. If one of the threads is capable of using many execution blocks, and if it expects a small amount of memory, it will consume all exec blocks, and the performance of the second thread will be low (but HT sometimes switches to the second thread. How often?). If both threads are not weight optimized and / or expect memory, execution units will be split between the two threads.
There are also images on page 112 (Figure 2-13), which shows that both logical cores are symmetrical.
Performance potential thanks to HT technology is due to:
- • The fact that operating systems and user programs can schedule processes or threads to run simultaneously on logical processors in each physical processor
- • Ability to use resources on a chip at a higher level than when one thread consumes execution resources; Higher resource utilization can result in higher system throughput.
Although instructions created from two programs or two threads are executed simultaneously and not necessarily programmatically in the execution kernel and memory hierarchy, the front and back ends contain several selection points for choosing between instructions from two logical processors. All selection points alternate between two logical processors if only one logical processor cannot use the scene pipeline. In this case, another logical processor makes full use of each scene pipeline cycle. Reasons why the logical processor cannot use the pipeline stage include cache misses, incorrect branch predictions, and team dependencies.