I have a memory leak problem with pandas
Dataframe
. Apparently this is a problem with information: Memory leak using pandas dataframe
The tricks used in the answer (use gc.collect
for manual garbage collection and free memory) work, but rather slowly.
My problem is that I need to run this loop with a frequency of 500 Hz:
- without garbage collector: memory leak, but 0.3-0.4ms / loop
- with gc.collect () in the loop: 11ms / loop !!!
(tested for 1000 cycles, with time.time()
: may be inaccurate, but gives a good idea of the problem)
My question is: what are the other alternatives gc.collect
that works just fine, but too slow. I cannot call it once every 1000 cycles, because this particular cycle will be extremely slow and I need a reliable frequency.
The code I use for testing is as follows:
import pandas as pd
import os
import gc
from multiprocessing import Process,Pipe
import time
a,b=Pipe()
def sender(a):
print "sender :", os.getpid()
while True:
Data=pd.DataFrame([[1.,2.,3.]],columns=['a','b','c'])
a.send(Data)
def main(b):
try:
print "receiver :", os.getpid()
i=0
while True:
Data=b.recv()
cmd=Data['a'].values[0]
i+=1
except (Exception,KeyboardInterrupt) as e:
print "Exception : ", e
raise
try:
p=Process(target=main,args=(b,))
q=Process(target=sender,args=(a,))
p.start()
q.start()
except (Exception,KeyboardInterrupt) as e:
print "Exception in main : ", e
p.terminate()
q.terminate()