I am puzzled by this set s memory allocation behavior:
>>> set(range(1000)).__sizeof__() 32968 >>> set(range(1000)).union(range(1000)).__sizeof__() # expected, set doesn't change 32968 >>> set(range(1000)).union(list(range(1000))).__sizeof__() #expected, set doesn't change 32968 >>> set(range(1000)).union(set(range(1000))).__sizeof__() # not expected 65736
Why does using the set argument as an argument double the amount of memory used by the set result? The result in both cases is identical to the original set :
>>> set(range(1000)) == set(range(1000)).union(range(1000)) == set(range(1000)).union(set(range(1000))) True
Note that the same thing happens using a regular iterator:
>>> set(range(1000)).union(iter(list(range(1000)))).__sizeof__() 32968
And using the update method:
>>> a.update(range(1000)) >>> a.__sizeof__() 32968 >>> a.update(set(range(1000))) >>> a.__sizeof__() 65736
At first I thought this was because when union is called, it sees that the size of the other set is 1000 and, therefore, decides to allocate enough memory to match all elements as set s, but then it uses only part of that memory, whereas in the case of an iterator, it simply performs iterators over it and adds the elements one by one (which does not consume more memory, since all the elements are already in set ).
But range also a sequence, as well as list in the first example.
>>> len(range(1000)) 1000 >>> range(1000)[100] 100
So why doesn't this happen with range and list , but only with set ? Is there any design decision behind this or is it a mistake?
Tested on python 2.7.3 and python 3.2.3 on 64-bit Linux.
python memory-management set
Bakuriu Mar 04 '13 at 9:14 2013-03-04 09:14
source share