Are there any advantages in the following build commands?

In our programming systems, we learn assembly language. In most sample programs, our prof. showed in classes; he uses:

XOR CX, CX 

instead

 MOV CX, 0 

or

 OR AX, AX JNE SOME_LABEL 

instead

 CMP AX, 0 JNE SOME_LABEL 

or

 AND AL, 0FH ; To convert input ASCII value to numeral ; The value in AL has already been checked to lie b/w '0' and '9' 

instead

 SUB AL, '0' 

My question is this: is there any better performance when using AND / OR or XOR instead of an alternative method (easy to understand / read)?

Since these programs are usually shown to us during lectures on theoretical lectures, most classes are unable to actually evaluate them verbally. Why spend 40 minutes on a lecture explaining these trivial statements?

+4
source share
5 answers
 XOR CX, CX ;0x31 0xC9 

Only two bytes are used: opcode 0x31 and the ModR / M byte, which stores the source and destination registers (in this case, the two are the same).

 MOV CX, 0 ;0xB8 0x08 0x00 0x00 

More bytes are required: opcode 0xB8 , ModR / M for the destination (in this case, CX), and two bytes that are immediately filled with zeros. There is no difference with the prospect of synchronization (both take only one clock cycle), but mov requires 4 bytes, and xor only two.

 OR AX, AX ;0x0A 0xC0 

again uses only operation bytes and ModRM bytes, and

 CMP AX, 0 ;0x3D 0x00 0x00 <-- but usually 0x3B ModRM 0x00 0x00 

uses three or four bytes. In this case, it uses three bytes ( 0x3D , the word immediate represents zero), since x86 has special opcodes for some operations with the Accumulator register, but usually it will use four bytes (opcode, ModR / M, word immediate). This is again the same when it comes to processor clock cycles.

There is no difference in processor performance

 AND AL, 0x0F ;0x24 0x0F <-- again special opcode for Accumulator 

and

 SUB AL, '0' ;0x2D 0x30 0x00 <-- again special opcode for Accumulator 

(only one byte difference), but when you subtract ASCII zero, you cannot be sure that there will be no value greater than 9 left in Accumulator. Also adding the OF and CF sets to zero, and sub sets them according to the AND ing result may be safer, but my personal opinion is that this use depends on the context.

+6
source

Besides saving the code size mentioned in other answers, it seemed to me that I mentioned a few more things that you can read more about in the Intel Optimization Guide and the Agner Fog x86 Optimization Guide :

XOR REG,REG and SUB REG,REG (with REG being the same for both operands) are recognized by modern x86 processors as dependency breaks; that they also serve the purpose of breaking false dependencies on previous register / flag values. Note that this does not necessarily apply if you clear the 8-bit or 16-bit register, but it will if you clear the 32-bit register.


 OR AX, AX JNE SOME_LABEL 

I believe that the preferred instruction will be TEST AX,AX . TEST can be macro-configured with any conditional branch (mainly in combination with a branch instruction before decoding) on โ€‹โ€‹modern x86 processors. CMP can only connect to unsigned conditional branches, at least until the Nehalem architecture. Again, I'm not sure if this is the case for 16-bit operands.

+3
source

An important difference is whether they affect the flags of CPU operations. When you use the logical operations xor , or , etc., then the operation flags are active. So:

 XOR CX, CX 

Not only will the CX value be zero, but, for example, the CPU zero flag will be set. The mov instruction does not affect flags. So:

 MOV CX, 0 

For example, it will not set the zero flag.

+1
source

In addition to the instruction scheduling mentioned earlier, which instruction is faster may also depend on the sequence of instructions being executed.

For an example of an invisible instruction that has a big impact, see page 8 in this article by Thorbjรธrn Granlund on GMP fame. In the three example, in the upper right corner of the page, a very fast split cycle begins with the nop instruction. According to Note 4 on the same page, the absence of a nop instruction causes the loop to execute 1 clock cycle more slowly. Granlund suggests experimenting by placing other loops inside the loop to achieve further acceleration.

My initial reaction to this was more instructions = more time. However, planning and executing commands is much more important than can be gleaned from the manuals.

+1
source

The XOR operation is faster than the MOV, since it is a bitwise operation, all bitwise operations are faster than the CPU.

-one
source

Source: https://habr.com/ru/post/1496635/


All Articles