This is a completely safe and useful optimization, very similar to 8-bit immediate, not 32-bit, when you write add eax, 1 .
NASM only optimizes when the shorter form of the command has the same architectural effect, since mov eax,1 implicitly nullifies the upper 32 bits of RAX .
But note that YASM does not do this, so it is recommended that you optimize yourself in the asm source if you need code size (even indirectly for performance reasons).
For instructions where the size of the 32-bit operand would not be equivalent if you had very large (or negative) numbers, you need to explicitly use the size of the 32-bit operand even if you build NASM instead of YASM if you want to increase the size / 32-bit operand performance. Benefits of using 32-bit registers / instructions in x86-64
For 32-bit constants that do not have their large bit set, a zero or a sign expanding them to 64 bits gives an identical result . So this is pure optimization for assembling mov rax, 1 to 5-byte mov r32, imm32 (with an implicit zero extension of up to 64 bits) instead of 7-byte mov r/m64, sign_extended_imm32 .
On all modern x86 processors, the only performance difference between this and 7-byte encoding is the size of the code, so only indirect effects like alignment and L1I $ pressure are a factor. Internally, it's just mov-instant, so this optimization does not change the microarchitectural effect of your code (except, of course, for the size of the code / alignment / how it is packaged in the uop cache).
10-byte encoding mov r64, imm64 even worse for code size. If a constant actually has any of its high bits, then it has additional inefficiency in the uop cache for Intel Sandybridge family processors (using 2 entries in the uop cache and, possibly, an additional loop to read from the uop cache). But if the constant is in the range -2 ^ 31 .. + 2 ^ 31 (signed 32-bit version), it is stored internally as efficiently using only one cache entry, even if it was encoded on an x86 machine using 64-bit immediate. (See Agar Fog microarch doc , Table 9.1. The size of the various instructions in the μop cache in the Sandybridge section)
How many ways to set the register to zero? , you can force any of the three encodings using NASM:
mov eax, 0 ; 5 bytes to encode (B8 imm32) mov rax, strict dword 0 ; 7 bytes: REX mov r/m64, sign-extended-imm32. NASM optimizes mov rax,0 to the 5B version, but dword or strict dword stops it for some reason mov rax, strict qword 0 ; 10 bytes to encode (REX B8 imm64). movabs mnemonic for AT&T. normally assemblers choose smaller encodings if the operand fits, but strict qword forces the imm64.
Note that NASM used 10-byte encoding (which AT & T syntax calls movabs , as well as objdump in Intel syntax mode) for an address that is a communication time constant but not known at build time.
YASM selects mov r64, imm32 , that is, it accepts a model code, where the tag addresses are 32 bits, unless you use mov rsi, strict qword msg
YASM behavior is usually good (although using mov r32, imm32 for static absolute addresses like C compilers would be even better). By default, a non-PIC code model puts all static code / data in a low 2GiB virtual address space, so addresses can be stored for 32-bit constants with zero or signed characters.
If you need 64-bit tags, you should usually use lea r64, [rel address] to perform a RIP relative LEA. (On Linux, at least position-dependent code can go at a low level of 32, so if you are not using large / huge code models, you need to take care of 64-bit shortcut addresses anytime, you also create code PIC where you should use the RIP relative LEA to avoid the need to rearrange text in absolute address constants).
i.e. gcc and other compilers would use mov esi, msg or lea rsi, [rel msg] , never mov rsi, msg .