I am starting a server that acts as node data processing for clients within a team. Recently, we used refactoring legacy code on a server to use numpy for some filtering / conversion jobs.
Because we need to serve this data to remote clients, we convert numpy data into various forms using numpy.tolist () as an intermediate step.
Each request has no state, there are no global variables, therefore links between requests are not supported.
In one specific step, I get an obvious memory leak that I was trying to track with memory_profiler . This step involves converting large (> 4m entries) ndarray from float to python list. The first time I issue a request, the tolist () call allocates 120 m of memory and then frees 31 m when I free the numpy array. The second (and subsequent time) I issue an identical request: distribution / release is 31 m. Each individual request that I issue has the same template, but with different absolute values.
I tore my code and forcibly used some del commands for illustrative purposes. Result below: memory_profiler.profile
First issue of the request:
Line # Mem usage Increment Line Contents ================================================ 865 296.6 MiB 0.0 MiB p = ikeyData[1]['value'] 866 417.2 MiB 120.6 MiB newArr = p.tolist() 867 417.2 MiB 0.0 MiB del p 868 385.6 MiB -31.6 MiB del ikeyData[1]['value'] 869 385.6 MiB 0.0 MiB ikeyData[1]['value'] = newArr
The second (and subsequent) instance of the same request:
Line # Mem usage Increment Line Contents ================================================ 865 494.7 MiB 0.0 MiB p = ikeyData[1]['value'] 866 526.3 MiB 31.6 MiB newArr = p.tolist() 867 526.3 MiB 0.0 MiB del p 868 494.7 MiB -31.6 MiB del ikeyData[1]['value'] 869 494.7 MiB 0.0 MiB ikeyData[1]['value'] = newArr
As you can imagine, in a long-term process with very variable requests, these distributions accumulate, forcing us to regularly abandon the server.
Does anyone have any thoughts on what could be happening here?