In all likelihood, it will be dependent on the situation, and the difference may not even be noticeable.
Factors such as out-of-order execution are likely to obscure any inherent โslownessโ of any version, unless a bottleneck actually exists.
However, if we had to choose which one is faster, then you are right that the second case will most likely be faster.
If we look at the Agner Fog tables for all current x86 processors:
Core 2:
add/sub r, r/i Latency = 1 , 1/Throughput = 0.33 add/sub r, m Latency = unknown , 1/Throughput = 1
Nehalem:
add/sub r, r/i Latency = 1 , 1/Throughput = 0.33 add/sub r, m Latency = unknown , 1/Throughput = 1
Sand Bridge:
add/sub r, r/i Latency = 1 , 1/Throughput = 0.33 add/sub r, m Latency = unknown , 1/Throughput = 0.5
K10:
add/sub r, r/i Latency = 1 , 1/Throughput = 0.33 add/sub r, m Latency = unknown , 1/Throughput = 0.5
In all cases, the version of the memory operand has less bandwidth. The delay is unknown in all cases, but there will almost certainly be more than 1 cycle. So this is worse in all factors.
In versions of the memory operand, all the same execution ports are used as for the immediate version +; for this, a memory reader port is also required. It can only make the situation worse. In fact, that is why the bandwidth is lower with the memory operand - the memory ports can only support 1 or 2 reads / cycle, while the adder can support full 3 / cycle.
In addition, this assumes that the data is in the L1 cache. If this is not the case, the memory operand version will be MUCH slower.
Taking this step further , we can examine the size of the encoded instructions:
add eax,val1 -> 03 05 14 00 00 00 add eax,10000h -> 05 00 00 01 00
The encoding for the first may vary slightly depending on the address val1 . The examples I showed here are in my specific test case.
Thus, the memory access version requires an extra byte for encoding - which means a slightly larger code size - and potentially more i-cache misses in the worst case.
So, in conclusion, if there is a difference in performance between versions, most likely, the immediate will be faster, because:
- It has lower latency.
- It has higher bandwidth.
- It has shorter coding.
- He does not need to access the data cache, which could potentially be a cache skip.