Roughly how much memory will a list of 80,000 items contain in python?

I have a python list that consists of 80,000 lists. Each of these internal lists more or less has this format:

["012345", "MYNAME" "Mon", "A", 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20] 

Could you tell us approximately how much memory this list of 80,000 lists will contain?

And is it common / normal to use and work with lists large in python? Most of the operations I perform are extracting data from this list with a way to understand the list.

In fact, I would like to know that: python is fast enough to extract data from these large lists using list comprehension methods. I want my script to be fast

+1
source share
5 answers
 In [39]: lis=["012345", "MYNAME" "Mon", "A", 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20] In [40]: k=[lis[:] for _ in xrange(80000)] In [41]: k.__sizeof__() Out[41]: 325664 In [42]: sys.getsizeof(k) #after gc_head Out[42]: 325676 

According to the code in sysmodule.c it looks like it is calling the __sizeof__ method to get the size of the object.

  837 method = _PyObject_LookupSpecial(o, &PyId___sizeof__); 838 if (method == NULL) { 839 if (!PyErr_Occurred()) 840 PyErr_Format(PyExc_TypeError, 841 "Type %.100s doesn't define __sizeof__", 842 Py_TYPE(o)->tp_name); 843 } 844 else { 845 res = PyObject_CallFunctionObjArgs(method, NULL); 846 Py_DECREF(method); 847 } 

and then adds some gc overhead to it:

  860 /* add gc_head size */ 861 if (PyObject_IS_GC(o)) { 862 PyObject *tmp = res; 863 res = PyNumber_Add(tmp, gc_head_size); 864 Py_DECREF(tmp); 865 } 866 return res; 867 } 

We can also use recursive sizeof recipe , as suggested in docs , to recursively calculate the size of each container:

 In [17]: total_size(k) #from recursive sizeof recipe Out[17]: 13125767 In [18]: sum(y.__sizeof__() for x in k for y in x) Out[18]: 34160000 
+3
source

On my machine using 32-bit Python 2.7.3, a list containing 80K copies of the exact list in your question takes about 10 MB. This was measured by comparing the memory traces of two other identical interpreters, one with and without a list.

I tried to measure the size with sys.getsizeof() , but this led to a clearly incorrect result:

 >>> l=[["012345", "MYNAME" "Mon", "A", 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20] for i in range(80000)] >>> sys.getsizeof(l) 325680 
+3
source

sys.getsizeof: (object, default value)
│ │ getizeof (object, default value) -> int
│ │
│ │ Returns the size of the object in bytes.

the code

 >> import sys >> sys.getsizeof(["012345", "MYNAME" "Mon", "A", 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20]) >> 160 

It returns 160 bytes for your list. Multiply this by 80,000 or 12.8 MB approximately. (32-bit machine with Python 2.7.2, Python 3.2)

+1
source

Applying the current (rev 13) code to Python Object Size (revised) and placed in a module called sizeof , and then applying it to your list of results leads to the following (using 32-bit Python 2.7.3):

 from sizeof import asizeof # from http://code.activestate.com/recipes/546530 MB = 1024*1024 COPIES = 80000 lis=["012345", "MYNAME" "Mon", "A", 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20] lis_size = asizeof(lis) print 'asizeof(lis): {} bytes'.format(lis_size) list_of_lis_size = asizeof([lis[:] for _ in xrange(COPIES)]) print 'asizeof(list of {:,d} copies of lis): {:,d} bytes ({:.2f} MB)'.format( COPIES, list_of_lis_size, list_of_lis_size/float(MB)) 
 asizeof(lis): 272 bytes asizeof(list of 80,000 copies of lis): 13,765,784 bytes (13.13 MB) 
+1
source

Pay attention to the following interaction with the interpreter:

 >>> import sys >>> array = ['this', 'is', 'a', 'string', 'array'] >>> sys.getsizeof(array) 56 >>> list(map(sys.getsizeof, array)) [29, 27, 26, 31, 30] >>> sys.getsizeof(array) + sum(map(sys.getsizeof, array)) 199 >>> 

The answer in this particular case is to use sys.getsizeof(array) + sum(map(sys.getsizeof, array)) to find the size of the list of strings. However, the following is a more complete implementation that takes into account object containers, classes, and ways to use __slots __.

 import sys def sizeof(obj): return _sizeof(obj, set()) def _sizeof(obj, memo): # Add this object size just once. location = id(obj) if location in memo: return 0 memo.add(location) total = sys.getsizeof(obj) # Look for any class instance data. try: obj = vars(obj) except TypeError: pass # Handle containers holding objects. if isinstance(obj, (tuple, list, frozenset, set)): for item in obj: total += _sizeof(item, memo) # Handle the two-sided nature of dicts. elif isinstance(obj, dict): for key, value in dict.items(): total += _sizeof(key, memo) + _sizeof(value, memo) # Handle class instances using __slots__. elif hasattr(obj, '__slots__'): for key, value in ((name, getattr(obj, name)) for name in obj.__slots__ if hasattr(obj, name)): total += _sizeof(key, memo) + _sizeof(value, memo) return total 

Edit:

After approaching this problem, after a while, the following alternative was developed. Note that this does not work with infinite iterators. This code is best suited for static data structures ready for analysis.

 import sys sizeof = lambda obj: sum(map(sys.getsizeof, explore(obj, set()))) def explore(obj, memo): loc = id(obj) if loc not in memo: memo.add(loc) yield obj # Handle instances with slots. try: slots = obj.__slots__ except AttributeError: pass else: for name in slots: try: attr = getattr(obj, name) except AttributeError: pass else: yield from explore(attr, memo) # Handle instances with dict. try: attrs = obj.__dict__ except AttributeError: pass else: yield from explore(attrs, memo) # Handle dicts or iterables. for name in 'keys', 'values', '__iter__': try: attr = getattr(obj, name) except AttributeError: pass else: for item in attr(): yield from explore(item, memo) 
0
source

Source: https://habr.com/ru/post/1203478/


All Articles