It depends - masking the flag will usually use the AND instruction, which will execute quickly (~ 1 loop) as soon as the data is in the register. Loading 64-bit data from memory will usually be slower than loading 32 bits of data, but if you use more than 32 flags, you will have to load more than 32 bits of data anyway and processing masking in one cycle will improve the speed by doing it in two or three instructions. Regardless of whether it matters for the overall speed, it will usually depend on the surrounding instructions - for example, if the data is already in the cache anyway, you may not need to load it from memory.
In other words, it’s hard to generalize — you just need to look at a specific sequence of code (not just one command, but a whole sequence) to say something — and the result for this sequence may not mean much about another sequence that was originally It looks almost identical.
source share