Python: class vs tuple huge memory overhead (?)

Question

Python: class vs tuple huge memory overhead (?)

I store a lot of complex data in tuples / lists, but prefer to use small wrapper classes to make it easier to understand the data structure, for example.

class Person: def __init__(self, first, last): self.first = first self.last = last p = Person('foo', 'bar') print(p.last) ...

preferable

 p = ['foo', 'bar'] print(p[1]) ...

However, the overhead appears to be damaged:

 l = [Person('foo', 'bar') for i in range(10000000)] # ipython now taks 1.7 GB RAM

and

 del l l = [('foo', 'bar') for i in range(10000000)] # now just 118 MB RAM

Why? Is there an obvious alternative solution that I haven't thought about?

Thanks!

(I know in this example the 'wrapper' class looks silly, but when the data becomes more complex and nested, it is more useful)

+6

python list class data-structures tuples

seb314 Jul 15 '17 at 22:23

source share

5 answers

Tuple literature in

 [('foo', 'bar') for i in range(10000000)]

- constant expression. The Pythphole Optimizer CPython will evaluate it and reuse the resulting object in a block of code. Thus, [('foo', 'bar') for i in range(10000000)] creates a list of 10,000,000 references to the same object:

 >>> {*map(id, tuple_l)} {140673197930568} # One unique memory address

Person('foo', 'bar') not recognized as a constant expression and therefore is evaluated at each iteration, which leads to the creation of 10,000,000 different objects:

 >>> len({*map(id, class_l)}) 10000000

This is the main reason for the huge difference in memory.

Pure-Python lessons are not very memory efficient, but you can add the __slots__ attribute to reduce the size of each instance:

 class Person: __slots__ = ('first', 'last') ...

Adding __slots__ reduces memory by about 60%.

+6

vaultah Jul 15 '17 at 10:39

source share

Using __slots__ significantly reduces the amount of memory (from 1.7 GB to 625 MB in my test), since each instance no longer needs to hold a dict to store attributes.

 class Person: __slots__ = ['first', 'last'] def __init__(self, first, last): self.first = first self.last = last

The downside is that you can no longer add attributes to an instance after creating it; the class provides only memory for the attributes listed in the __slots__ attribute.

+5

chepner Jul 15 '17 at 10:39

source share

There is another way to reduce the amount of memory occupied by objects by disabling support for circular garbage collection in addition to disabling __dict__ and __weakref__ . This is implemented in the recordclass library:

 $ pip install recordclass >>> import sys >>> from recordclass import dataobject, make_dataclass

Create a class:

 class Person: first:str last:str

or

 >>> Person = make_dataclass('Person', 'first last')

As a result:

 >>> print(sys.getsizeof(Person(100,100))) 32

For a __slot__ basis of __slot__ we have:

 class Person: __slots__ = ['first', 'last'] def __init__(self, first, last): self.first = first self.last = last >>> print(sys.getsizeof(Person(100,100))) 56

As a result, greater memory savings are possible.

0

intellimath Dec 19 '18 at 10:08

source share

In your second example, you create only one object, because tuples are constants.

 >>> l = [('foo', 'bar') for i in range(10000000)] >>> id(l[0]) 4330463176 >>> id(l[1]) 4330463176

Classes have an overhead that attributes are stored in the dictionary. Therefore namedtuples needs only half the memory.

-1

Daniel Jul 15 '17 at 10:41

source share

randomir · Accepted Answer · 2017-07-15T23:13:36+0000

As others said in their answers, you will need to generate different objects in order for the comparison to make sense.

So, compare some approaches.

`tuple`

 l = [(i, i) for i in range(10000000)] # memory taken by Python3: 1.0 GB

`class Person`

 class Person: def __init__(self, first, last): self.first = first self.last = last l = [Person(i, i) for i in range(10000000)] # memory: 2.0 GB

`namedtuple` ( `tuple` + `slots` )

 from collections import namedtuple Person = namedtuple('Person', 'first last') l = [Person(i, i) for i in range(10000000)] # memory: 1.1 GB

namedtuple is basically a class that extends tuple and uses __slots__ for all named fields, but it adds getters and some other helper methods (you can see the exact code generated if called with verbose=True ).

`class Person` + `slots`

 class Person: __slots__ = ['first', 'last'] def __init__(self, first, last): self.first = first self.last = last l = [Person(i, i) for i in range(10000000)] # memory: 0.9 GB

This is a shortened version of namedtuple above. A clear winner, even better than pure tuples.

Python: class vs tuple huge memory overhead (?)

tuple

class Person

namedtuple ( tuple + __slots__ )

class Person + __slots__

More articles:

`tuple`

`class Person`

`namedtuple` ( `tuple` + `slots` )

`class Person` + `slots`