32-bit variable overflow

I am currently implementing the equation (2^A)[X + Y*(2^B)] in one of my applications.

The problem is overflowing a 32-bit value, and I cannot use the 64-bit data type.

Suppose that with B = 3 and Y = 805306367 it overflows with a 32-bit value, but when X = -2147483648 , the result returns to the 32-bit range.
So I want to save the result (Y*2^B) . Can anyone suggest some solution for this ... A and B have values ​​from -15 to 15 and X , Y can have values ​​from 2147483647..-2147483648 .
The output signal can be from 0...4294967295 .

+6
source share
4 answers

If the number is too large for a 32-bit variable, you either use more bits (either by storing in a larger variable, or using several variables), or you refuse precision and store it in a float. Since Y can be MAX_INT, by definition you cannot multiply it by a number greater than 1, and it still fits into a 32-bit int.

+6
source

In this case, I would use a loop instead of multiplication. Something like that:

 int newX = X; int poweredB = ( 1 << B ); // 2^B for( int i = 0; i < poweredB ; ++i ) { newX += Y; // or change X directly, if you will not need it later. } int result = ( 1 << A ) * newX; 

But note : for some situations this will work - only if you have a guarantee , this result will not be overwhelmed. In your case, when Y is big positive and X is a big negative number (β€œbig” is argh, this is too subjective), this will definitely work. But if X is big positive and Y is big positive, there will again be overflow. And not only in this case, but also with many others.

+1
source

Based on the values ​​for A and B in the assignment, I assume that the expected solution will include the following:

  • unsigned is best done, so keep the signs for X and Y and act on their absolute value

  • Store X and Y in two variables, one of which contains 16 bits, the other contains the least significant bits, something like int hiY = Y and 0xFFFF0000;

    int loY = Y and 0x0000FFFF;

  • Move the upper parts to the right so that all variables have high bits 0

  • Y * (2 ^ B) is actually a shift of Y from the left by B bits. This is equivalent to shifting the high and low parts with B bits, and since you moved the upper part, both operations will fit into their 32-bit integer

  • Process X is similar in the upper and lower parts

  • Tracking traits X and Y computes X + Y * (2 ^ B) for high and low parts

  • Again shift both high and low results on bits A

  • Attach the top and bottom to the final result

+1
source

If you cannot use 64 bits because your local C does not support them and not some other main reason, you can consider the GNU multicast arithmetic library at http://gmplib.org/

+1
source

Source: https://habr.com/ru/post/890338/


All Articles