Explicit memory leak with numpy tolist () in a long process

I am starting a server that acts as node data processing for clients within a team. Recently, we used refactoring legacy code on a server to use numpy for some filtering / conversion jobs.

Because we need to serve this data to remote clients, we convert numpy data into various forms using numpy.tolist () as an intermediate step.

Each request has no state, there are no global variables, therefore links between requests are not supported.

In one specific step, I get an obvious memory leak that I was trying to track with memory_profiler . This step involves converting large (> 4m entries) ndarray from float to python list. The first time I issue a request, the tolist () call allocates 120 m of memory and then frees 31 m when I free the numpy array. The second (and subsequent time) I issue an identical request: distribution / release is 31 m. Each individual request that I issue has the same template, but with different absolute values.

I tore my code and forcibly used some del commands for illustrative purposes. Result below: memory_profiler.profile

First issue of the request:

Line # Mem usage Increment Line Contents ================================================ 865 296.6 MiB 0.0 MiB p = ikeyData[1]['value'] 866 417.2 MiB 120.6 MiB newArr = p.tolist() 867 417.2 MiB 0.0 MiB del p 868 385.6 MiB -31.6 MiB del ikeyData[1]['value'] 869 385.6 MiB 0.0 MiB ikeyData[1]['value'] = newArr 

The second (and subsequent) instance of the same request:

 Line # Mem usage Increment Line Contents ================================================ 865 494.7 MiB 0.0 MiB p = ikeyData[1]['value'] 866 526.3 MiB 31.6 MiB newArr = p.tolist() 867 526.3 MiB 0.0 MiB del p 868 494.7 MiB -31.6 MiB del ikeyData[1]['value'] 869 494.7 MiB 0.0 MiB ikeyData[1]['value'] = newArr 

As you can imagine, in a long-term process with very variable requests, these distributions accumulate, forcing us to regularly abandon the server.

Does anyone have any thoughts on what could be happening here?

+2
source share
1 answer

In your case, Python probably released the memory.

This does not mean that the memory allocator necessarily returns memory to the operating system. memory_profiler uses system calls to find out the current amount of memory used. So there is probably nothing wrong with your code.

+1
source

Source: https://habr.com/ru/post/1266137/


All Articles