Why NASM on Linux changes case in x86_64 assembly

I am new to x86_64 build programming. I wrote a simple "Hello World" program in x86_64 assembly. Below is my code that works fine.

global _start section .data msg: db "Hello to the world of SLAE64", 0x0a mlen equ $-msg section .text _start: mov rax, 1 mov rdi, 1 mov rsi, msg mov rdx, mlen syscall mov rax, 60 mov rdi, 4 syscall 

Now that I understand gdb, it outputs below:

 (gdb) disas Dump of assembler code for function _start: => 0x00000000004000b0 <+0>: mov eax,0x1 0x00000000004000b5 <+5>: mov edi,0x1 0x00000000004000ba <+10>: movabs rsi,0x6000d8 0x00000000004000c4 <+20>: mov edx,0x1d 0x00000000004000c9 <+25>: syscall 0x00000000004000cb <+27>: mov eax,0x3c 0x00000000004000d0 <+32>: mov edi,0x4 0x00000000004000d5 <+37>: syscall End of assembler dump. 

My question is why does NASM behave this way? I know that it changes instructions based on the operation code, but I'm not sure about the same behavior with registers.

Does this behavior also affect the functionality of the executable?

I am using Ubuntu 16.04 (64 bit) installed in VMware on an i5 processor.

Thanks in advance.

+2
source share
2 answers

In 64-bit mode mov eax, 1 upper part of the rax register will be cleared (see here for an explanation), therefore mov eax, 1 is semantically equivalent to mov rax, 1 .

However, despite the fact that in the previous case the prefix REX.W ( 48h ) was reserved (the byte needed to indicate the registers entered using x86-64), the operation code is the same for both commands ( 0b8h followed by DWORD or QWORD )
So, the assembler goes forward and takes the shortest form.

This is typical NASM behavior, see Section 3.3 of the NASM manual, where the example [eax*2] compiled as [eax+eax] to store the disp32 field after SIB 1 ( [eax*2] is encoded only as [eax*2+disp32] where the assembler sets disp32 to 0).

I was not able to get NASM to emit the real mov rax, 1 command (i.e. 48 B8 01 00 00 00 00 00 00 00 ) even by prefixing the command with o64 .
If you need a real mov rax, 1 (this is not your case), you need to resort to assembling it manually with db and the like.

EDIT : Peter Cordes answer shows that there is actually a way to tell NASM not to optimize the instruction using the strict modifier.
mov rax, STRICT 1 creates a 10-byte version of the instruction ( mov r64, imm64 ), and mov rax, STRICT DWORD 1 creates a 7-byte version ( mov r64, imm32 , where imm32 signed before the extension).


Note: it is better to use RIP-relative addressing , this avoids 64-bit immediate constants (thus reducing code size) and is mandatory on MacOS (in case you care).
Change mov esi, msg to lea esi, [REL msg] (RIP-relative is the addressing mode, so it needs an “addressing”, a square bracket to avoid reading from this address, which we use lea , which only calculates the effective address, but does not have access).
You can use the DEFAULT REL directive to avoid entering a REL in every memory access.

I got the impression that the PIC code is required for the Mach-O format, but this may not be possible .


1 The base byte of the scaling index used to encode the new addressing mode, then introduced in 32-bit mode.

+3
source

This is a completely safe and useful optimization, very similar to 8-bit immediate, not 32-bit, when you write add eax, 1 .

NASM only optimizes when the shorter form of the command has the same architectural effect, since mov eax,1 implicitly nullifies the upper 32 bits of RAX .

But note that YASM does not do this, so it is recommended that you optimize yourself in the asm source if you need code size (even indirectly for performance reasons).

For instructions where the size of the 32-bit operand would not be equivalent if you had very large (or negative) numbers, you need to explicitly use the size of the 32-bit operand even if you build NASM instead of YASM if you want to increase the size / 32-bit operand performance. Benefits of using 32-bit registers / instructions in x86-64


For 32-bit constants that do not have their large bit set, a zero or a sign expanding them to 64 bits gives an identical result . So this is pure optimization for assembling mov rax, 1 to 5-byte mov r32, imm32 (with an implicit zero extension of up to 64 bits) instead of 7-byte mov r/m64, sign_extended_imm32 .

On all modern x86 processors, the only performance difference between this and 7-byte encoding is the size of the code, so only indirect effects like alignment and L1I $ pressure are a factor. Internally, it's just mov-instant, so this optimization does not change the microarchitectural effect of your code (except, of course, for the size of the code / alignment / how it is packaged in the uop cache).

10-byte encoding mov r64, imm64 even worse for code size. If a constant actually has any of its high bits, then it has additional inefficiency in the uop cache for Intel Sandybridge family processors (using 2 entries in the uop cache and, possibly, an additional loop to read from the uop cache). But if the constant is in the range -2 ^ 31 .. + 2 ^ 31 (signed 32-bit version), it is stored internally as efficiently using only one cache entry, even if it was encoded on an x86 machine using 64-bit immediate. (See Agar Fog microarch doc , Table 9.1. The size of the various instructions in the μop cache in the Sandybridge section)

How many ways to set the register to zero? , you can force any of the three encodings using NASM:

 mov eax, 0 ; 5 bytes to encode (B8 imm32) mov rax, strict dword 0 ; 7 bytes: REX mov r/m64, sign-extended-imm32. NASM optimizes mov rax,0 to the 5B version, but dword or strict dword stops it for some reason mov rax, strict qword 0 ; 10 bytes to encode (REX B8 imm64). movabs mnemonic for AT&T. normally assemblers choose smaller encodings if the operand fits, but strict qword forces the imm64. 

Note that NASM used 10-byte encoding (which AT & T syntax calls movabs , as well as objdump in Intel syntax mode) for an address that is a communication time constant but not known at build time.

YASM selects mov r64, imm32 , that is, it accepts a model code, where the tag addresses are 32 bits, unless you use mov rsi, strict qword msg

YASM behavior is usually good (although using mov r32, imm32 for static absolute addresses like C compilers would be even better). By default, a non-PIC code model puts all static code / data in a low 2GiB virtual address space, so addresses can be stored for 32-bit constants with zero or signed characters.

If you need 64-bit tags, you should usually use lea r64, [rel address] to perform a RIP relative LEA. (On Linux, at least position-dependent code can go at a low level of 32, so if you are not using large / huge code models, you need to take care of 64-bit shortcut addresses anytime, you also create code PIC where you should use the RIP relative LEA to avoid the need to rearrange text in absolute address constants).

i.e. gcc and other compilers would use mov esi, msg or lea rsi, [rel msg] , never mov rsi, msg .

+2
source

Source: https://habr.com/ru/post/1275434/


All Articles