(Intel) hyper-threading cores act as (up to) two processors.
The observation is that one processor has a set of resources that are ideally busy continuously, but in practice they often sit idle while the processor waits for any external event, usually memory reads or writes.
By adding some additional status information for another hardware thread (for example, another copy of the registers + additional material), a “single” processor can turn its attention to the execution of another thread when the first one is blocked. (You can summarize these N hardware threads, and other architectures did this; Intel ended up with 2).
If both hardware threads spend their time waiting for various events, the CPU may possibly perform the appropriate processing for the hardware threads. 40 nanoseconds to wait for memory is a long time. Therefore, if your program receives a lot of memory, I expect it to look as if both hardware threads were fully efficient, for example, you should get almost 2x.
If two hardware threads perform work that is very local (for example, intensive computations only in registers), then internal expectations become minimal, and one processor cannot switch fast enough to serve both hardware threads as fast as they generate work. In this case, performance will deteriorate. I don’t remember where I heard it, and I heard it a long time ago: under such circumstances, the net effect is more like 1.3x than idealized 2x. (Expecting the SO audience to change me on this).
Your application can switch back and forth according to its needs depending on which part is currently running. Then you get a combination of performance. I am pleased with any speed I can get.
source share