Modulo arithmetic not optimized by gcc?

Question

Modulo arithmetic not optimized by gcc?

Consider this simple function that adds a constant:

unsigned char f(unsigned char x) { return x + 5; }

This generates the following assembly (from -O3 to gcc 4.7.2):

 leal 5(%rdi), %eax ret

Now, since unsigned overflow is the correct behavior in C, we can assume that adding a modulo operation should be essentially nop:

 unsigned char f(unsigned char x) { return (x + 5) % 256; // assume char is 8-bits, which is typical }

But the generated assembly has additional instructions:

 leal 5(%rdi), %eax movzbl %al, %eax ret

Can someone enlighten me why this is so? I am not very good at assembly.

(Note: This is just a toy problem that I made to understand how GCC optimizes the code.)

+4

optimization c assembly gcc modulo

Rufflewind Jul 01 '13 at 15:32

source share

1 answer

Bryan olivier · Accepted Answer · 2013-07-01T17:34:13+0000

For a definite answer to the question why the generated code is different from the other, you probably need an engineer with a deep understanding of the details of this gcc compiler. You can experiment with a few examples below:

 unsigned char f1(unsigned char x) { return x + 5; } unsigned char f2(unsigned char x) { return (x + 5) % 256; } unsigned char f3(unsigned char x) { return (x + 5) % 256U; } unsigned char f4(unsigned char x) { return (x + 5) & 0xFFU; }

Since gcc version 4.1.2 is suitable for 64-bit systems, I get the same code for all these functions for both 64-bit and 32-bit code. In fact, everyone includes movzbl . What could be a mistake in gcc compilation f1 (and, most likely, corrected by the subscriber). Actually, it depends on the calling conventions: should the value of 8 bits in a 64-bit register be zero / sign extended or not. I couldn’t find a definitive answer to this in the draft version 0.96 of the binary interface of the V System V application, the AMD64 architecture architecture processing application dated June 14, 2005. The gcc 4.1.2 compiler seems to adhere to the philosophy of “better than sorry”, as movzbl also occurs on the caller’s side In my experience, it is usually required that such values be zero / sign if one of them does not have operations working with parts of the register, which is rather unusual.

Interestingly, my home gcc compiler version 4.3.2 makes a slight difference in the fact that f2 is implemented through the operation and. All the rest just add 5, which strongly indicates that the responsibility for the zero / sign extension function lies with the callers. But this is 32 bit code.

If I find a definitive answer for adding zeros / signs of values in oversized registers in any of the specifications of the architecture, I will let you know. I also need to know this professionally.

In defense of your gcc compiler. You are looking at optimizing a small beer. Normal code does not contain such a module, and it would be nice if the compiler somewhere along the line reduces such a special module with respect to a and. In the case of %256 (vs %256U ), some analysis of the range of values is required to determine that a is enough, because the modulation is performed in "signed" arithmetic. It is clear that my compiler at some point concludes that it’s enough, but apparently too late to determine that it is then introduced by entering the result, which he defined in other cases. This is what compiler engineers call the "phase ordering problem."

Update to zero / sign expansion of values in registers.

I refused this quest and should continue to work with some colleagues, since I did not find a final statement if it is expected that the results of parameters / functions will have a zero / sign.

I found the following in connection with this in the above ABI specification.

Booleans stored in a memory object are stored as single-byte objects whose value is always 0 (false) or 1 (true). When they are stored in integer registers or passed as arguments on the stack, all 8 bytes of the register are significant; any nonzero value is considered true.

Thus, Boolean types must be zero.

For calls that can call functions that use varargs or stdargs (calls without a prototype or calls to functions that contain an ellipsis (...) in the declaration) %al (Note 14) is used as a hidden argument to indicate the number of SSE registers used. The contents of %al should not exactly match the number of registers, but should be the upper limit of the number of SSE registers used and should be in the range 0-8 inclusive.
Note 14: Note that the rest of %rax is undefined, only the contents of %al determined.

So, for this special use of %al it does not need to be extended.

Given that logical numbers must be zero, we can conclude that the spirit of ABI is that other types of subwords must also be extended. Taking a more formal position, it can be argued that the absence of any statement should be interpreted as if there were no null / sign extension. Overall, this is not satisfactory.

Update 2 when adding zeros / characters in registers.

I discussed this issue with a colleague. The newest version of ABI from version 2012 of version 0.99 was precisely changed when passing parameters by logical parameters, since they are zero only up to 8 bits. This suggests that it has been modified to fit the passage of other types of subwords, since all were not null / character expanded. The AMD64 architecture also supports subword registers for half of 64-bit registers and can perform operations on these subword registers. This is probably the motivation not to skip parameters in advanced mode with a zero / icon.

Modulo arithmetic not optimized by gcc?

More articles: