I am using IPython. Parallel to process a large amount of data in a cluster. The remote function that I run looks like this:
def evalPoint(point, theta):
which is called by this function:
def eval(theta, client, lview, data): async_results = [] for point in data:
If I run the code:
client = Client(profile="ssh") client[:].execute("import numpy as np") lview = client.load_balanced_view() for i in xrange(100): eval(theta, client, lview, data)
memory usage continues to grow until I finish (76 GB of memory). I simplified evalPoint to do nothing to make sure it is not a criminal.
The first part of eval was copied from the IPython documentation on how to use a load balancer. The second part (unpacking and averaging) is pretty straightforward, so I do not think that this is responsible for a memory leak. Also, I tried manually deleting objects in eval and calling gc.collect() without any luck.
I was hoping that someone with IPython.parallel experience could point to something obvious that I was doing wrong, or I could actually confirm this with a memory leak.
Some additional facts:
- I am using Python 2.7.2 on Ubuntu 11.10
- I am using IPython version 0.12
- I have servers running on servers 1-3, and a client and hub running on server 1. I get similar results if I only keep everything on server 1.
- The only thing I found that looked like a memory leak for IPython was related to
%run , which I suppose was fixed in this version of IPython (I also don't use %run )
Update
In addition, I tried switching memory-based logging to SQLiteDB if that was a problem, but still has the same problem.
answer (1)
The memory consumption is definitely in the controller (I could check this: (a) starting the client on another computer and (b) viewing from above). I didn’t understand that not SQLiteDB would consume memory anyway, so I didn’t worry about cleaning.
If I use DictDB and brushing, I can still see that memory consumption is growing, but much slower. It ranged around 2 GB in 20 calls to eval ().
If I use MongoDB and cleanup, it looks like mongod takes up about 4.5 GB of memory and ipcluster about 2.5 GB.
If I use SQLite and try to clear, I get the following error:
File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/hub.py", line 1076, in purge_results self.db.drop_matching_records(dict(completed={'$ne':None})) File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/sqlitedb.py", line 359, in drop_matching_records expr,args = self._render_expression(check) File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/sqlitedb.py", line 296, in _render_expression expr = "%s %s"%null_operators[op] TypeError: not enough arguments for format string
So, I think that if I use DictDB, I will probably be fine (I'm going to try tonight). I'm not sure if some memory consumption is still expected or not (I also clear the client, as you expected).