How to make NASM encode [1 + rax * 2] as disp32 + index * 2 instead of disp8 + base + index?

To effectively execute x = x*10 + 1 , it is probably optimal to use

 lea eax, [rax + rax*4] ; x*=5 lea eax, [1 + rax*2] ; x = x*2 + 1 

The 3-component LEA has a higher latency on modern Intel processors, for example. 3 cycles versus 1 in the Sandybridge family, therefore disp32 + index*2 faster than disp8 + base + index*1 on the SnB family , that is, most of the main x86 processors for which we care about optimization. (Basically, this applies only to LEA, not to downloads / stores, since LEA runs on ALU actuators rather than AGUs on most modern x86 processors.) AMD processors have a slower LEA with 3 components or scale > 1 ( http : //agner.org/optimize/ )

But NASM and YASM will optimize the code size using [1 + rax + rax*1] for the second LEA, which only needs disp8 instead of disp32. (Addressing modes always have a base register or disp32).

i.e. they always split reg*2 into base+index , because it's never worse for code size.

I can force disp32 with lea eax, [dword 1 + rax*2] , but this does not stop NASM or YASM from decoupling the addressing mode. The NASM manual does not seem to document how to use the strict keyword on a scale factor, and [1 + strict rax*2] not going to. Is there a way to use strict or some other syntax to force the encoding of the addressing mode ?


nasm -O0 to disable optimization does not work. Apparently, it only controls multi-pass optimization of branching and offset, not all NASM optimizations. Of course, you do not want to do this primarily for the whole source file, even if it works. I'm still getting

 8d 84 00 01 00 00 00 lea eax,[rax+rax*1+0x1] 

The only workaround I can come up with is to manually encode it with db . This is rather inconvenient. To record manual encoding:

 db 0x8d, 0x04, 0x45 ; opcode, modrm, SIB for lea eax, [disp32 + rax*2] dd 1 ; disp32 

The scale factor is encoded in the high 2 bits of the SIB byte. I compiled lea eax, [dword 1 + rax*4] to get the machine code for the correct registers, because NASM optimization only works for *2 . The SIB was 0x85 and reduced the 2-bit field at the top of the byte to reduce the scale factor from 4 to 2.


But the question is: how to write it in an easy-to-read form, which simplifies register changes and makes NASM encode the addressing mode for you? (I suppose a giant macro could do this with text processing and manual db coding, but this is not exactly the answer I'm looking for. Actually I don't need this for anything, I basically want to know if the syntax has NASM or YASM syntax.)

Other optimizations that I know of, for example, mov rax, 1 assembly in 5-byte mov eax,1 are net wins on all processors, if you do not want longer instructions to get a complement without NOP, and you can disable it using mov rax, strict dword 1 to get 7-byte extended encoding or strict qword for 10-byte imm64.


gas does not perform this or most other optimizations (only the sizes of immediate and branch offsets): lea 1(,%rax,2), %eax is collected in
8d 04 45 01 00 00 00 lea eax,[rax*2+0x1] , as well as for the .intel_syntax noprefix version.

Answers for MASM or other builders would also be interesting.

+5
source share
1 answer

NOSPLIT :

Similarly, NASM will split [eax*2] into [eax+eax] , because it eliminates the offset field and saves space; in fact, it will also split [eax*2+offset] by [eax+eax+offset] .
You can combat this behavior with the NOSPLIT : [nosplit eax*2] will force [eax*2+0] be generated.
[nosplit eax*1] also has the same effect. In another way, the separation form EA [0, eax*2] can also be used. However, the NOSPLIT in [nosplit eax+eax] will be ignored, since the user's intention here is considered [eax+eax] .

 lea eax, [NOSPLIT 1+rax*2] lea eax, [1+rax*2] 00000000 8D044501000000 lea eax,[rax*2+0x1] 00000007 8D440001 lea eax,[rax+rax+0x1] 
+6
source

Source: https://habr.com/ru/post/1275426/


All Articles