Protecting the suffix "U" after Hex characters

The suffix U discussed between my colleague and me after hexadecimal literals. Please note: this is not a question of the meaning of this suffix or what it does. I found several of these topics here, but I did not find the answer to my question.

Some background information:

We are trying to come up with a set of rules with which we both agree, to use this as our style from now on. We have a copy of the 2004 Misra C rules and decided to use this as a starting point. We are not interested in full compatibility with Misra C; we chose cherry rules, which, in our opinion, will contribute to increasing efficiency and reliability.

Rule 10.6 of the above guidelines states:

The suffix "U" applies to all constants of an unsigned type.

I personally think this is a good rule. It takes a little effort, looks better than explicit casts, and more clearly shows the intention of the constant. It makes sense for me to use it for all unsigned constants, and not just numbers, since the rule is not enforced, allowing exceptions, especially for the widely used representation of constants.

However, my colleague believes that the hexadecimal representation does not need a suffix. Mostly because we use it almost exclusively to set the registers of the microcontroller, and the signature does not matter when setting the registers to hexadecimal constants.

My question

My question is not who is right or wrong. It is a question of whether there are cases when the absence or presence of a suffix changes the result of the operation. Are there any such cases, or is it a matter of consistency?

Edit: to clarify; In particular, about setting up the microcontroller registers by assigning hexadecimal values ​​to them. Would there be a case when a suffix could make a difference? I feel that this is not so. As an example, the Freescale Processor Expert generates all register assignments as unsigned.

+5
source share
2 answers

Adding the suffix U to all hexadecimal constants makes them unsigned, as you already mentioned. This can have undesirable side effects when these constants are used in operations along with signs, especially comparisons.

Here is a pathological example:

 #define MY_INT_MAX 0x7FFFFFFFU // blindly applying the rule if (-1 < MY_INT_MAX) { printf("OK\n"); } else { printf("OOPS!\n"); } 

The C rules for signed / unsigned conversions are well defined, but somewhat counterintuitive, so the code above will actually print OOPS .

The MISRA-C rule is accurate because it indicates that the suffix "U" applies to all constants of an unsigned type. The word unsigned has far-reaching consequences, and most constants should not be considered unsigned.

In addition, the C-standard makes the difference in summation between decimal and hexadecimal constants:

  • A hexadecimal constant is considered unsigned if its value can be represented by an unsigned integer, rather than an integer type with a sign of the same size for int types and larger.

This means that in 32-bit systems the additions 2147483648 have long or long long , while 0x80000000 is unsigned int . Adding the U suffix may make this more explicit in this case, but the real precaution to avoid potential problems is to mandate the compiler to refuse signature / unsigned gcc -Wall -Wextra -Werror : gcc -Wall -Wextra -Werror or clang -Weverything -Werror are life savers.

Here's how it goes wrong:

 if (-1 < 0x8000) { printf("OK\n"); } else { printf("OOPS!\n"); } 

The above code should print OK on 32-bit systems and OOPS on 16-bit systems. To further aggravate the situation, it is still widely believed that embedded projects use outdated compilers that do not even use standard semantics for this problem.

For your specific question, certain values ​​for microprocessor registers used specifically for their installation through assignment (provided that these registers are mapped to memory), it is not necessary to have the suffix U The value l of the register must be of an unsigned type, and the hexadecimal value will be signed or unsigned, depending on its value, but the operation will continue. The operation code to set the number with or without a sign is the same for your target architecture and any architectures I have even seen.

+9
source

For all integer constants

Adding u/U ensures that the integer constant is some unsigned type.


Without u/U

  • For a decimal constant, the integer constant will be some signed type.

  • For a hex / octal constant, an integer constant will be signed with either an unsigned type, depending on the range of values ​​and integer types.


Note. All integer constants have positive values.

 // +-------- unary operator // |+-+----- integer-constant int x = -123; 

Does the absence or presence of a suffix change the result of an operation?

When is this important?

Using various expressions, it is necessary to control the meaning of the sign and the width of the mathematics, and this is not surprising.

 // Examples: assume 32-bit `unsigned`, `long`, 64-bit `long long` // Bad signed int overflow (UB) unsigned a = 4000 * 1000 * 1000; // OK unsigned b = 4000u * 1000 * 1000; // undefined behavior unsigned c = 1 << 31 // OK unsigned d = 1u << 31 printf("Size %zu\n", sizeof(0xFFFFFFFF)); // 8 type is `long long` printf("Size %zu\n", sizeof(0xFFFFFFFFu)); // 4 type is `unsigned` // 2 ** 63 long long e = -9223372036854775808; // C99: bad "9223372036854775808" not representable long long f = -9223372036854775807 - 1; // ok long long g = -9223372036854775808u; // implementation defined behavior ** some_unsigned_type h_max = -1; OK, max value for the target type. some_unsigned_type i_max = -1u; OK, but not max value for wide unsigned types // when negating a negative `int` unsigned j = 0 - INT_MIN; // typically int overflow or UB unsigned k = 0u - INT_MIN; // Never UB 

** or signal defined by implementation.

+1
source

Source: https://habr.com/ru/post/1270759/


All Articles