Any advantage of XOR AL, AL + MOVZX EAX, AL over XOR EAX, EAX?

I have an unknown C ++ code that was compiled in Release build, so it is optimized. What I'm struggling with is:

xor al, al add esp, 8 cmp byte ptr [ebp+userinput], 31h movzx eax, al 

This is my understanding:

 xor al, al ; set eax to 0x??????00 (clear last byte) add esp, 8 ; for some unclear reason, set the stack pointer higher cmp byte ptr [ebp+userinput], 31h ; set zero flag if user input was "1" movzx eax, al ; set eax to AL and extend with zeros, so eax = 0x000000?? 

I don't need lines 2 and 3. They can be in this order for pipelining reasons, and IMHO has nothing to do with EAX.

However, I don't understand why I first cleared AL, but just cleared the rest of EAX later. The result will be IMHO always EAX = 0 , so it could also be

 xor eax, eax 

instead of this. What is the advantage or “optimization” of this part of the code?

Some background information:

I will get the source code later. This is a small C ++ console demo, perhaps only 20 lines of code, so there’s nothing that I would call "complex" code. The IDA shows one cycle in this program, but not around this part. Stud_PE's signature scan did not find anything, but most likely it is a Visual Studio 2013 or 2015 compiler.

+5
source share
1 answer

xor al,al already slower than xor eax,eax for most processors. for example, on Haswell / Skylake, it needs ALU uop and does not break the dependence on the old eax / rax . This is equally bad for AMD processors, or Atom / Silvermont. (Well, maybe not, because AMD does not eliminate xor eax,eax when releasing / renaming, but it still has a false dependency that can serialize a new chain of dependencies with any last used eax ).

On CPUs that rename al separately from the rest of the register (Intel pre-IvyBridge), xor al,al can still be recognized as a zeroing idiom , but if you do not want to keep the upper bytes of the register, the best way to zero al is xor eax,eax .

Running movzx on top of this is just worse.


I assume that your compiler was somehow confused and decided that it needed a 1-byte zero, but then I realized that it needed to push it up to 32 bits. xor sets the flags, so after cmp he could not xor -zero, and he didn’t notice that before cmp he could just trim x <0>.

Either this, or something like a Jester clause where movzx is the target of the branch. Even if this case, xor eax,eax would still be better, because the zero extension in eax unconditionally follows this code path.

I'm curious which compiler created this from which source.

+3
source

Source: https://habr.com/ru/post/1273223/


All Articles