Python function slows down with big list

I tested the speeds of several different ways to perform complex iterations on some of my data, and I found something strange. It seems that having a large list local to some function significantly slows down this function, even if it does not concern this list. For example, creating 2 independent lists through 2 instances of the same generator function a second time is about 2.5 times slower. If the first list is deleted before the second is created, both iterators go with the same character.

def f(): l1, l2 = [], [] for c1, c2 in generatorFxn(): l1.append((c1, c2)) # destroying l1 here fixes the problem for c3, c4 in generatorFxn(): l2.append((c3, c4)) 

Lists ultimately make up about 3.1 million items each, but I saw the same effect with smaller lists. The first for loop takes about 4.5 seconds, and the second 10.5. If I insert l1= [] or l1= len(l1) at the comment position, both for loops take 4.5 seconds.

Why does the rate of distribution of local memory in a function have anything to do with the current size of the variables of this function?

EDIT: Disabling the garbage collector fixes everything, so it should be related to a constant run. Case is closed!

+6
source share
4 answers

When you create many new objects (3 million tuples), the garbage collector gets bogged down. If you turn off garbage collection using gc.disable (), the problem will disappear (and the program will start 4 times faster to load).

+9
source

It is impossible to say without more detailed tools.

As a very, very preliminary step, check the use of main memory. If your RAM is full and your OS is swap to disk, your performance will be pretty terrible. In this case, you might be better off taking your intermediate products and putting them somewhere other than memory. If you only need sequential reads of your data, consider writing to a simple file; if your data follows a strict structure, consider storing it in a relational database.

+2
source

My assumption is that when you create the first list, more available memory is available, which means there is less likelihood that the list needs to be redistributed as it grows.

After you take a decent chunk of memory with the first list, your second list has a higher chance of redistributing as it grows, since python lists are dynamic in size.

+2
source

The memory used by local data for the function is not going to be garbage collected until the function returns. If you don’t need to do slicing, using lists for large data collections is a great idea.

From your example, it’s not entirely clear what the purpose of creating these lists is. You might want to consider using generators instead of lists, especially if the lists just repeat themselves. If you need to perform slicing on the returned data, then insert the generators into the lists.

0
source

Source: https://habr.com/ru/post/886910/


All Articles