Python: class vs tuple huge memory overhead (?)

I store a lot of complex data in tuples / lists, but prefer to use small wrapper classes to make it easier to understand the data structure, for example.

class Person: def __init__(self, first, last): self.first = first self.last = last p = Person('foo', 'bar') print(p.last) ... 

preferable

 p = ['foo', 'bar'] print(p[1]) ... 

However, the overhead appears to be damaged:

 l = [Person('foo', 'bar') for i in range(10000000)] # ipython now taks 1.7 GB RAM 

and

 del l l = [('foo', 'bar') for i in range(10000000)] # now just 118 MB RAM 

Why? Is there an obvious alternative solution that I haven't thought about?

Thanks!

(I know in this example the 'wrapper' class looks silly, but when the data becomes more complex and nested, it is more useful)

+6
source share
5 answers

As others said in their answers, you will need to generate different objects in order for the comparison to make sense.

So, compare some approaches.

tuple

 l = [(i, i) for i in range(10000000)] # memory taken by Python3: 1.0 GB 

class Person

 class Person: def __init__(self, first, last): self.first = first self.last = last l = [Person(i, i) for i in range(10000000)] # memory: 2.0 GB 

namedtuple ( tuple + __slots__ )

 from collections import namedtuple Person = namedtuple('Person', 'first last') l = [Person(i, i) for i in range(10000000)] # memory: 1.1 GB 

namedtuple is basically a class that extends tuple and uses __slots__ for all named fields, but it adds getters and some other helper methods (you can see the exact code generated if called with verbose=True ).

class Person + __slots__

 class Person: __slots__ = ['first', 'last'] def __init__(self, first, last): self.first = first self.last = last l = [Person(i, i) for i in range(10000000)] # memory: 0.9 GB 

This is a shortened version of namedtuple above. A clear winner, even better than pure tuples.

+8
source

Tuple literature in

 [('foo', 'bar') for i in range(10000000)] 

- constant expression. The Pythphole Optimizer CPython will evaluate it and reuse the resulting object in a block of code. Thus, [('foo', 'bar') for i in range(10000000)] creates a list of 10,000,000 references to the same object:

 >>> {*map(id, tuple_l)} {140673197930568} # One unique memory address 

Person('foo', 'bar') not recognized as a constant expression and therefore is evaluated at each iteration, which leads to the creation of 10,000,000 different objects:

 >>> len({*map(id, class_l)}) 10000000 

This is the main reason for the huge difference in memory.

Pure-Python lessons are not very memory efficient, but you can add the __slots__ attribute to reduce the size of each instance:

 class Person: __slots__ = ('first', 'last') ... 

Adding __slots__ reduces memory by about 60%.

+6
source

Using __slots__ significantly reduces the amount of memory (from 1.7 GB to 625 MB in my test), since each instance no longer needs to hold a dict to store attributes.

 class Person: __slots__ = ['first', 'last'] def __init__(self, first, last): self.first = first self.last = last 

The downside is that you can no longer add attributes to an instance after creating it; the class provides only memory for the attributes listed in the __slots__ attribute.

+5
source

There is another way to reduce the amount of memory occupied by objects by disabling support for circular garbage collection in addition to disabling __dict__ and __weakref__ . This is implemented in the recordclass library:

 $ pip install recordclass >>> import sys >>> from recordclass import dataobject, make_dataclass 

Create a class:

 class Person: first:str last:str 

or

 >>> Person = make_dataclass('Person', 'first last') 

As a result:

 >>> print(sys.getsizeof(Person(100,100))) 32 

For a __slot__ basis of __slot__ we have:

 class Person: __slots__ = ['first', 'last'] def __init__(self, first, last): self.first = first self.last = last >>> print(sys.getsizeof(Person(100,100))) 56 

As a result, greater memory savings are possible.

0
source

In your second example, you create only one object, because tuples are constants.

 >>> l = [('foo', 'bar') for i in range(10000000)] >>> id(l[0]) 4330463176 >>> id(l[1]) 4330463176 

Classes have an overhead that attributes are stored in the dictionary. Therefore namedtuples needs only half the memory.

-1
source

Source: https://habr.com/ru/post/1269859/


All Articles