Can there be any penalties when mixing 32 and 64-bit registers in sequential instructions?
No, writing to the 32-bit register is always zero - it continues until the full register , so x86-64 avoids any penalties for partial register for 32 and 64-bit.
Thus, I believe that 32 bits are still native.
Yes, the default operand size for most instructions is 32-bit ( except for PUSH / POP ). A 64-bit code requires a REX prefix with the W bit set to 1. Prefer 32-bit reasons for the code. This is why compilers use mov r32, imm32 for static data addresses (since the default code model requires codes and static data addresses to be in the 2GiB low virtual address space).
It was a design from AMD. They could choose a different path and required a prefix to get the size of the 32-bit operand. Since continuous mode is a separate mode, machine code x86-64 may be different from machine code x86-32, but it wants to. AMD decided to minimize the differences so that they could share as many transistors in the decoders as possible. Your conclusion is correct, but your reasoning is completely fictitious.
partial register updates (for example, ax instead of eax) can cause eflags to stop and degrade performance.
Partial flag posts are separated from private register scores. They are handled similarly internally (separately renamed parts of EFLAGS must be combined in the same way as modified AX must be combined with unmodified EAX high bytes). But the other does not cause the other .
# partial-reg stall setcc al
Zeroing EAX before setting the flag and setcc with xor eax,eax completely eliminates the penalty for partial registrar . (Core2 / Nehalem stops for fewer cycles than previous processors, but still stops at 2 or 3c, inserting a uop merge. Sandybridge does not stop at all when inserting a uop merge).
(Another summary of partial register fines on different processors: Why doesn't GCC use partial registers ? , saying basically the same thing).
AMD does not suffer from incomplete registers when it reads a full register later, but instead partial registration of records and reads has a false dependence on a full register. (AMD processors do not rename subregisters separately in the first place. Intel P4 and Silvermont / Knight Landing are similar.)
Intel Haswell / Skylake (and possibly Ivybridge) do not rename al separately from rax at all , so they never need to combine low8 / low16 registers. But setcc al has a false dependency on the old value. They are still renaming and merging ah . ( Details of HSW / SKL partial write performance .
See this Q&A question for a more detailed discussion of partial flag issues on Intel pre-Sandybridge vs. Sandybridge
See also Agar Fog microarch pdf and other links in x86 wiki tags for more details on all this.