__sizeof__ str is larger than the __sizeof__ tuple containing this string

The following code produces this output.

import sys print('ex1:') ex1 = 'Hello' print('\t', ex1.__sizeof__()) print('\nex2:') ex2 = ('Hello', 53) print('\t', ex2.__sizeof__()) 

Output:

 ex1: 54 ex2: 40 

Why __sizeof__() print a smaller result when looking at the second element? Shouldn't there be a way out anymore? I understand from this answer that I have to use sys.getsizeof() , but the behavior still seems weird. I am using Python 3.5.2 .

In addition, as @Herbert noted, 'Hello' takes up more memory than ('Hello',) , which is tuple . Why is this?

+5
source share
1 answer

This is because tuple objects (and I'm sure all containers except the string) evaluate their size not , including the actual sizes of their respective contents, but rather by calculating the size of the pointers to the PyObject times the elements they contain. That is, they contain pointers to a (common) PyObject and contain what contributes to its overall size.

This is outlined in the Data Model chapter of the Python Reference:

Some objects contain links to other objects; they are called containers. Examples of containers are tuples, lists, and dictionaries. Links are part of the value of the container.

(I emphasize word references.)

In PyTupleType , a structure that contains information about the tuple type, we see that the tp_itemsize field has sizeof(PyObject *) as its value:

 PyTypeObject PyTuple_Type = { PyVarObject_HEAD_INIT(&PyType_Type, 0) "tuple", sizeof(PyTupleObject) - sizeof(PyObject *), sizeof(PyObject *), // <-- sizeof pointer to PyObject's 

32 bit assemblies and 64 bit Python assemblies have sizeof(PyObject *) equal to 8 bytes.

This is the value that will be multiplied by the number of elements contained in the tuple instance. When we look at object_size , the __sizeof__ method that tuple inherits from object (check object.__sizeof__ is tuple.__sizeof__ ), we can clearly see this:

 static PyObject * object_sizeof(PyObject *self, PyObject *args) { Py_ssize_t res, isize; res = 0; isize = self->ob_type->tp_itemsize; if (isize > 0) res = Py_SIZE(self) * isize; // <-- num_elements * tp_itemsize res += self->ob_type->tp_basicsize; return PyLong_FromSsize_t(res); } 

see how isize (derived from tp_itemsize ) is multiplied by Py_SIZE(self) , which is another macro that captures the ob_size value indicating the number of elements inside the tuple .

That’s why, even if we create a slightly larger row inside the tuple instance:

 t = ("Hello" * 2 ** 10,) 

with an element inside it having a size:

 t[0].__sizeof__() # 5169 

tuple instance size:

 t.__sizeof__() # 32 

equals one with a simple "Hello" inside:

 t2 = ("Hello",) t[0].__sizeof__() # 54 t2.__sizeof__() # 32 Tuple size stays the same. 

For strings, each individual character increments the value returned from str.__sizeof__ . This, along with the fact that tuple stores only pointers, gives the false impression that "Hello" is larger than the tuple containing it.

Just for completeness, unicode__sizeof__ is the one that computes this. It really just multiplies the length of the string with the size of the character (which depends on which character has character 1 , 2 and 4 ).

The only thing I do not get with tuples is why the main size (indicated by tb_basicsize ) is specified as sizeof(PyTupleObject) - sizeof(PyObject *) . This discards bytes 8 of the total return size; I have not found an explanation for this (yet).

+13
source

Source: https://habr.com/ru/post/1260158/


All Articles