Memory leak error in IPython.parallel module?

I am using IPython. Parallel to process a large amount of data in a cluster. The remote function that I run looks like this:

def evalPoint(point, theta): # do some complex calculation return (cost, grad) 

which is called by this function:

 def eval(theta, client, lview, data): async_results = [] for point in data: # evaluate current data point ar = lview.apply_async(evalPoint, point, theta) async_results.append(ar) # wait for all results to come back client.wait(async_results) # and retrieve their values values = [ar.get() for ar in async_results] # unzip data from original tuple totalCost, totalGrad = zip(*values) avgGrad = np.mean(totalGrad, axis=0) avgCost = np.mean(totalCost, axis=0) return (avgCost, avgGrad) 

If I run the code:

 client = Client(profile="ssh") client[:].execute("import numpy as np") lview = client.load_balanced_view() for i in xrange(100): eval(theta, client, lview, data) 

memory usage continues to grow until I finish (76 GB of memory). I simplified evalPoint to do nothing to make sure it is not a criminal.

The first part of eval was copied from the IPython documentation on how to use a load balancer. The second part (unpacking and averaging) is pretty straightforward, so I do not think that this is responsible for a memory leak. Also, I tried manually deleting objects in eval and calling gc.collect() without any luck.

I was hoping that someone with IPython.parallel experience could point to something obvious that I was doing wrong, or I could actually confirm this with a memory leak.

Some additional facts:

  • I am using Python 2.7.2 on Ubuntu 11.10
  • I am using IPython version 0.12
  • I have servers running on servers 1-3, and a client and hub running on server 1. I get similar results if I only keep everything on server 1.
  • The only thing I found that looked like a memory leak for IPython was related to %run , which I suppose was fixed in this version of IPython (I also don't use %run )

Update

In addition, I tried switching memory-based logging to SQLiteDB if that was a problem, but still has the same problem.

answer (1)

The memory consumption is definitely in the controller (I could check this: (a) starting the client on another computer and (b) viewing from above). I didn’t understand that not SQLiteDB would consume memory anyway, so I didn’t worry about cleaning.

If I use DictDB and brushing, I can still see that memory consumption is growing, but much slower. It ranged around 2 GB in 20 calls to eval ().

If I use MongoDB and cleanup, it looks like mongod takes up about 4.5 GB of memory and ipcluster about 2.5 GB.

If I use SQLite and try to clear, I get the following error:

 File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/hub.py", line 1076, in purge_results self.db.drop_matching_records(dict(completed={'$ne':None})) File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/sqlitedb.py", line 359, in drop_matching_records expr,args = self._render_expression(check) File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/sqlitedb.py", line 296, in _render_expression expr = "%s %s"%null_operators[op] TypeError: not enough arguments for format string 

So, I think that if I use DictDB, I will probably be fine (I'm going to try tonight). I'm not sure if some memory consumption is still expected or not (I also clear the client, as you expected).

+4
source share
1 answer

Is it a controller process that is growing, or a client, or both?

The controller remembers all the queries and all the results, so the default behavior for storing this information in a simple dict will lead to constant growth. Using a back-end db (sqlite or preferably mongodb, if available) should address this, or the client.purge_results() method can be used to instruct the controller to discard any / all result history (this will remove them from db if you use one )

The client itself caches all its own results in its results dict, so this will also lead to growth over time. Unfortunately, this is a bit more difficult to obtain, because links can spread in all directions and are not affected by the backend of the controller.

This is a known issue in IPython, but for now you should be able to clear links manually by deleting entries in dicts client results / metadata and if your opinion sticks out, it has its own dict results:

 # ... # and retrieve their values values = [ar.get() for ar in async_results] # clear references to the local cache of results: for ar in async_results: for msg_id in ar.msg_ids: del lview.results[msg_id] del client.results[msg_id] del client.metadata[msg_id] 

Or you can clear the entire client cache server with simple dict.clear() :

 view.results.clear() client.results.clear() client.metadata.clear() 

Side note:

Views have their own wait () method, so you don't need to pass the client to your function at all. Everything should be accessible through View, and if you really need a client (for example, to clear the cache), you can get it as view.client .

+6
source

Source: https://habr.com/ru/post/1390363/


All Articles