Assembly: REP MOV

Look at the following assembly code:

MOV ESI, DWORD PTR [EBP + C] MOV ECX, EDI MOV EAX, EAX SHR ECX, 2 LEA EDI, DWORD PTR[EBX + 18] REP MOVS DWORD PTR ES:[EDI], DWORD PTR [ESI] MOV ECX, EAX AND ECX, 3 REP MOVS BYTE PTR ES:[EDI], BYTE PTR[ESI] 

In the book, I received a passage from the code explaining the first REP MOVS as copying more than 4 byte fragments, and the second REP MOVS copying the remaining 2 byte fragment, if one exists.

How do REP MOVS instructions REP MOVS ? According to MSDN , a command can be prefixed with REP to repeat the operation the number of times specified in the ecx register. "Wouldn't that repeat the same operation over and over?

+6
source share
2 answers

For questions about specific instructions, always consult the instructions for a set of instructions.

In this case, you will need to look for rep and movs (which is not mov ). In short, rep repeats the following operation of the string ecx times. movs moves data from ds:esi to es:edi and increases or decreases pointers based on the setting of the direction flag. Thus, repeating this will move the memory range to another location.

PS: usually the size of an operation is encoded as a command suffix, so people use movsb and movsd to indicate a byte or dword . However, some assemblers allow you to specify the size, as in your example, byte ptr or dword ptr . In addition, the operands are implicit in the instruction, and you cannot modify them.

+13
source

Brief explanation

At the assembly code level, two forms of this command are allowed: the explicit operands form and the nooperand form. The form of explicit operands allows you to specify the source and destination address of the memory in explicit form with characters. This form of explicit operands is provided to provide documentation; however, please note that the documentation provided by this form may be misleading. That is, the symbol must not indicate the correct source and destination address. The source address is always specified by DS: (RSI / ESI / SI), and the destination address is always specified by the ES: (RDI / EDI / DI) registers, which must be loaded correctly before running the movsb . This is how I understand Intel’s official position on this.

Long explanation

REP MOVS DWORD PTR ES:[EDI], DWORD PTR [ESI] is synonymous with REP MOVSD ; and REP MOVS BYTE PTR ES:[EDI], BYTE PTR[ESI] is synonymous with REP MOVSB .

The following MOVS commands based on data sizes exist:

  • MOVSB ​​(byte, 8 bit)
  • MOVSW (word, 16 bits)
  • MOVSD (dword, 32-bit)
  • MOVSQ (qword, 64 bit) - only available in 64-bit mode

The MOVS command copies data from DS: (SI / ESI / RSI) to ES: (DI / EDI / RDI) - the size of the SI / DI register is based on your current mode - 16-bit, 32-bit or 64-bit. It also increases (decreases) the SI and DI registers (based on the D flag, sets the CLD to increase the registers).

The MOVS command cannot use registers other than SI / DI, so there is no need to specify them.

If the MOVS command has the prefix REP, copying the number of CX bytes (ECX / RCX) is repeated, decreasing CX, so at the end CX becomes zero.

Since the first Pentium processor, released in 1993, Intel started making simple instructions to execute faster and more complex instructions (like REP MOVS) - slower.

So, REP MOVS became very slow, and there was no more reason to use it.

In 2013, Intel decided to return to REP MOVS. If a processor (created after 2013) has a CPUID ERMSB bit (Encens REP MOVSB ​​bit), the rep movsb and rep stosb commands run differently than on older processors and should be fast. In practice, it is performed only for large blocks, 256 bytes or more, and only if certain conditions are met:

  • both the source and destination addresses must be aligned with a 16-byte border (this border size is recommended for Ivy Bridge processors, at a later border it can be larger, up to 64 bytes for Cannonlake);
  • the source area should not overlap with the destination area;
  • length must be a multiple of 64 bytes to improve performance;
  • direction must be directed (CLD).

See Intel Optimization Guide, Section 3.7.6 Advanced REP MOVSB ​​and STOSB (ERMSB) operations http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64- ia-32-architectures-optimization-manual.pdf

REP MOVS instructions are very slow on small blocks since the startup cost is about 35 cycles. If you use a simple simple EAX MOV in a loop, there is no upfront cost, and you can copy a lot of data in these 35 cycles.

Note that ERMSB produces better results for REP MOVSB ​​rather than REP MOVSD (MOVSQ). All REP MOVS instructions are much faster, but REP MOVSB ​​is faster than all.

So, the code that you showed is not optimal for processors without ERMSB (since a simple simple copy of MOV EAX will be faster) or with ERMSB (because only MOVSB ​​works fast and not MOVSD, although the difference is not that big).

The code you provided can only give the best results on very old processors, such as 80386, released in 1985.

+2
source

Source: https://habr.com/ru/post/980609/


All Articles