32-bit variable overflow

Question

32-bit variable overflow

I am currently implementing the equation (2^A)[X + Y*(2^B)] in one of my applications.

The problem is overflowing a 32-bit value, and I cannot use the 64-bit data type.

Suppose that with B = 3 and Y = 805306367 it overflows with a 32-bit value, but when X = -2147483648 , the result returns to the 32-bit range.
So I want to save the result (Y*2^B) . Can anyone suggest some solution for this ... A and B have values from -15 to 15 and X , Y can have values from 2147483647..-2147483648 .
The output signal can be from 0...4294967295 .

+6

c

latot Jun 12 '11 at 12:44

source share

4 answers

Paul tomblin · Answer 1 · 2011-06-12T12:49:00+0000

If the number is too large for a 32-bit variable, you either use more bits (either by storing in a larger variable, or using several variables), or you refuse precision and store it in a float. Since Y can be MAX_INT, by definition you cannot multiply it by a number greater than 1, and it still fits into a 32-bit int.

Kiril Kirov · Answer 2 · 2011-06-12T12:53:02+0000

In this case, I would use a loop instead of multiplication. Something like that:

 int newX = X; int poweredB = ( 1 << B ); // 2^B for( int i = 0; i < poweredB ; ++i ) { newX += Y; // or change X directly, if you will not need it later. } int result = ( 1 << A ) * newX;

But note : for some situations this will work - only if you have a guarantee , this result will not be overwhelmed. In your case, when Y is big positive and X is a big negative number (“big” is argh, this is too subjective), this will definitely work. But if X is big positive and Y is big positive, there will again be overflow. And not only in this case, but also with many others.

Andrei · Answer 3 · 2011-06-12T13:25:32+0000

Based on the values for A and B in the assignment, I assume that the expected solution will include the following:

unsigned is best done, so keep the signs for X and Y and act on their absolute value
Store X and Y in two variables, one of which contains 16 bits, the other contains the least significant bits, something like int hiY = Y and 0xFFFF0000;
int loY = Y and 0x0000FFFF;
Move the upper parts to the right so that all variables have high bits 0
Y * (2 ^ B) is actually a shift of Y from the left by B bits. This is equivalent to shifting the high and low parts with B bits, and since you moved the upper part, both operations will fit into their 32-bit integer
Process X is similar in the upper and lower parts
Tracking traits X and Y computes X + Y * (2 ^ B) for high and low parts
Again shift both high and low results on bits A
Attach the top and bottom to the final result

Gilbert · Answer 4 · 2011-06-12T23:55:44+0000

If you cannot use 64 bits because your local C does not support them and not some other main reason, you can consider the GNU multicast arithmetic library at http://gmplib.org/

32-bit variable overflow

More articles: