Representing negative integers in Python
>>> x = -4 >>> print("{} {:b}".format(x, x)) -4 -100 >>> mask = 0xFFFFFFFF >>> print("{} {:b}".format(x & mask, x & mask)) 4294967292 11111111111111111111111111111100 >>> >>> x = 0b11111111111111111111111111111100 >>> print("{} {:b}".format(x, x)) 4294967292 11111111111111111111111111111100 >>> print("{} {:b}".format(~(x ^ mask), ~(x ^ mask))) -4 -100 I find it hard to understand how Python represents negative integers, and therefore how bit operations work. I understand that Python is trying to imitate two additions, but with any number of bits. Therefore, they usually use 32-bit masks to force Python to set the standard size for integers before bit operations.
As you can see in my example, -4 & 0xFFFFFFFF gives a large positive number. Why does Python seem to read this as an unsigned integer, instead of a negative number from two additions? Later, the operation ~(x ^ mask) , which should give the same two bits of the complement bit as the large positive one, gives -4 instead. What causes the conversion to a signed int?
Thanks!
TL; DR; The integer type CPython stores the character in a specific structure field. When performing a bitwise operation, CPython replaces negative numbers with their two additions, and sometimes (!) Performs the opposite operation (that is, replaces two additions with negative numbers).
Bit operations
An internal representation of an integer is the PyLongObject structure, which contains the PyVarObject structure. (When CPython creates a new PyLong object, it allocates memory for the structure and oblique space for the numbers.) Whatβs the matter here is that PyLong has dimensions: the ob_size field PyVarObject nested structure contains the size (in numbers) of the integer (15 or 30 bit digits) ) If the integer is negative, then this size minus the number of digits.
(Links: https://github.com/python/cpython/blob/master/Include/object.h and https://github.com/python/cpython/blob/master/Include/longobject.h )
As you can see, the internal representation of an integer in CPython is really far from the usual binary representation. However, CPython must provide bitwise operations for various purposes. Let's look at the comments in the code :
static PyObject * long_bitwise(PyLongObject *a, char op, /* '&', '|', '^' */ PyLongObject *b) { /* Bitwise operations for negative numbers operate as though on a two complement representation. So convert arguments from sign-magnitude to two complement, and convert the result back to sign-magnitude at the end. */ /* If a is negative, replace it by its two complement. */ /* Same for b. */ /* Complement result if negative. */ } To process negative integers in bitwise operations, CPython uses two additions (in fact, these are two additions digit by digit, but I will not go into details). But pay attention to the "Sign Rule" (my name): the sign of the result is a bitwise operator applied to the signs of numbers. More precisely, the result will be negative if nega <op> negb == 1 , ( negx = 1 for negative, 0 for positive). Simplified code :
switch (op) { case '^': negz = nega ^ negb; break; case '&': negz = nega & negb; break; case '|': negz = nega | negb; break; default: ... } Binary formatting
On the other hand, [format_long_internal](https://github.com/python/cpython/blob/master/Python/formatter_unicode.c#L839) formatting does not perform two additions, even in binary representation: [format_long_internal](https://github.com/python/cpython/blob/master/Python/formatter_unicode.c#L839) calls [long_format_binary](https://github.com/python/cpython/blob/master/Objects/longobject.c#L1934) and delete the two leading characters, but leave the mark. See code :
/* Is a sign character present in the output? If so, remember it and skip it */ if (PyUnicode_READ_CHAR(tmp, inumeric_chars) == '-') { sign_char = '-'; ++prefix; ++leading_chars_to_skip; } The long_format_binary function long_format_binary not perform any two additions: just print the number in base 2 preceded by a sign .
if (negative) \ *--p = '-'; \ Your question
I will follow your REPL sequence:
>>> x = -4 >>> print("{} {:b}".format(x, x)) -4 -100 Not surprising, given that in the formatting there are no two additions, except for the sign.
>>> mask = 0xFFFFFFFF >>> print("{} {:b}".format(x & mask, x & mask)) 4294967292 11111111111111111111111111111100 The number -4 negative. Consequently, it is replaced by two additions before the logical and, digit by digit. You expected the result to be converted to a negative number, but change the "Sign Rule":
>>> nega=1; negb=0 >>> nega & negb 0 Therefore: 1. the result does not have a negative sign; 2. the result is not supplemented by two. Your result complies with the "signing rule", even if this rule does not seem intuitive.
Now the last part:
>>> x = 0b11111111111111111111111111111100 >>> print("{} {:b}".format(x, x)) 4294967292 11111111111111111111111111111100 >>> print("{} {:b}".format(~(x ^ mask), ~(x ^ mask))) -4 -100 Again, -4 negative, so it is replaced by two additions 0b11111111111111111111111111111100 , then 0b11111111111111111111111111111111 with 0b11111111111111111111111111111111 . Result 0b11 ( 3 ). You take a single complement, i.e. again 0b11111111111111111111111111111100 , but this time the sign is negative:
>>> nega=1; negb=0 >>> nega ^ negb 1 Therefore, the result is supplemented and receives a negative sign, as you expected.
Conclusion: I think that there was no ideal solution to have an arbitrary long number with a sign and provide bitwise operations, but the documentation is not very verbose regarding the choice made.