Which aligns a 16-byte border value in x86

The official Intel Optimization Guide contains a chapter on converting MMX commands to SSE, where they bring the accepted status into compliance:

Computation instructions that use a memory operand that cannot be aligned with a 16-byte boundary must be replaced by an asymmetric 128-bit load (MOVDQU), followed by the same calculation operation, which uses registers instead.

(Chapter 5.8 Converting from 64-bit to 128-bit SIMD Integers, p. 5-43)

I can’t understand what they mean by “cannot be matched with a 16-byte border”, could you clarify it and give some examples?

+6
source share
3 answers

Certain SIMD instructions that execute the same instruction for multiple data require that the memory address of this data be aligned with a specific byte boundary. This actually means that the memory address in which your data is located must be divisible by the number of bytes required by the instruction.

So, in your case, the alignment is 16 bytes (128 bits), which means that the memory address of your data must be a multiple of 16. For example. 0x00010 will be aligned by 16 bytes, but 0x00011 will not.

How to get your alignment data depends on the programming language (and sometimes the compiler) that you use. Most languages ​​that have the concept of a memory address will also provide you with a means to indicate alignment.

+10
source

Data aligned on a 16-bit boundary will have a memory address equal to an even number - strictly speaking, a multiple of two. Each byte has 8 bits, so to align on a 16-bit boundary, you need to align each set of two bytes.

Likewise, a memory aligned on a 32-bit boundary would have a memory address that is a multiple of four, because you group four bytes together to form a 32-bit word.

0
source

I guess here, but could it be that “cannot be aligned with a 16-byte boundary” means that this memory location was aligned with a lower value (4 or 8 bytes) earlier for some other purpose and now to execute SSE instructions in this memory do you need to explicitly load it into a register?

0
source

Source: https://habr.com/ru/post/913608/


All Articles