What causes the (large) size of python lists?

I was messing around with sys.getsizeof and was a bit surprised when I got into lists and arrays:

 >>> from sys import getsizeof as sizeof >>> list_ = range(10**6) >>> sizeof(list_) 8000072 

Compared to an array:

 >>> from array import array >>> array_ = array('i', range(10**6)) >>> sizeof(array_) 56 

Displays the size of a list of integers equal to 1/3 of the size of all its elements, so it cannot hold them:

 >>> sizeof(10**8) 24 >>> for i in xrange(0,9): ... round(sizeof(range(10**i)) / ((10**i) * 24.0), 4), "10**%s elements" % (i) ... (3.3333, '10**0 elements') (0.6333, '10**1 elements') (0.3633, '10**2 elements') (0.3363, '10**3 elements') (0.3336, '10**4 elements') (0.3334, '10**5 elements') (0.3333, '10**6 elements') (0.3333, '10**7 elements') (0.3333, '10**8 elements') 

What causes this behavior, with both list being large, but not as large as all of its elements and the array are so small?

+4
source share
2 answers

You are having a problem with array objects that do not reflect their size correctly.

Until Python 2.7.3, the .__sizeof__() object did not , accurately reflects the size. In Python 2.7.4 and later, as well as in any other release of Python 3 released after August 2012, a bug fix was added in which a size was added.

On Python 2.7.5, I see:

 >>> sys.getsizeof(array_) 4000056L 

which corresponds to 56 bytes of the size that my 64-bit system requires for the base object, plus 4 bytes for each signed integer.

In Python 2.7.3, I see:

 >>> sys.getsizeof(array_) 56L 

The Python list objects on my system use 8 bytes per link, so their size is, of course, almost twice as large.

+3
source

The getizeof function does not measure the size of elements in a container, like a list. You need to add all the individual elements.

Here is the recipe for this.

Reproduced here:

 from __future__ import print_function from sys import getsizeof, stderr from itertools import chain from collections import deque try: from reprlib import repr except ImportError: pass def total_size(o, handlers={}, verbose=False): """ Returns the approximate memory footprint an object and all of its contents. Automatically finds the contents of the following builtin containers and their subclasses: tuple, list, deque, dict, set and frozenset. To search other containers, add handlers to iterate over their contents: handlers = {SomeContainerClass: iter, OtherContainerClass: OtherContainerClass.get_elements} """ dict_handler = lambda d: chain.from_iterable(d.items()) all_handlers = {tuple: iter, list: iter, deque: iter, dict: dict_handler, set: iter, frozenset: iter, } all_handlers.update(handlers) # user handlers take precedence seen = set() # track which object id have already been seen default_size = getsizeof(0) # estimate sizeof object without __sizeof__ def sizeof(o): if id(o) in seen: # do not double count the same object return 0 seen.add(id(o)) s = getsizeof(o, default_size) if verbose: print(s, type(o), repr(o), file=stderr) for typ, handler in all_handlers.items(): if isinstance(o, typ): s += sum(map(sizeof, handler(o))) break return s return sizeof(o) 

If you use this recipe and run it in the list, you can see the difference:

 >>> alist=[[2**99]*10, 'a string', {'one':1}] >>> print('getsizeof: {}, total_size: {}'.format(getsizeof(alist), total_size(alist))) getsizeof: 96, total_size: 721 
0
source

Source: https://habr.com/ru/post/1496549/


All Articles