IPython Parallel Computation Name Issues

I read and re-read the documentation / tutorial on IPython, and I cannot understand the problem with this particular piece of code. It appears that the dimensionless_run function is not visible in the namespace delivered for each of the engines, but I got confused because the function is defined in __main__ and is clearly visible as part of the global namespace.

wrapper.py:

 import math, os def dimensionless_run(inputs): output_file = open(inputs['fn'],'w') ... return output_stats def parallel_run(inputs): import math, os ## Removing this line causes a NameError: global name 'math' ## is not defined. folder = inputs['folder'] zfill_amt = int(math.floor(math.log10(inputs['num_iters']))) for i in range(inputs['num_iters']): run_num_str = str(i).zfill(zfill_amt) if not os.path.exists(folder + '/'): os.mkdir(folder) dimensionless_run(inputs) return if __name__ == "__main__": inputs = [input1,input2,...] client = Client() lbview = client.load_balanced_view() lbview.block = True for x in sorted(globals().items()): print x lbview.map(parallel_run,inputs) 

Running this code after ipcluster start --n=6 yields a sorted global dictionary, including the math and os modules, and the parallel_run and dimensionless_run functions. This is followed by IPython.parallel.error.CompositeError: one or more exceptions to the method call: parallel_run, which consists of a large number of [n:apply]: NameError: global name 'dimensionless_run' is not defined , where n runs from 0-5 .

There are two things that I don’t understand, and they are clearly related.

  • Why doesn't the code identify dimensionless_run in the global namespace?
  • Why is import math, os required in the parallel_run definition?

Edited by:. This turned out to be not a very big namespace error - I ran ipcluster start --n=6 in a directory that did not contain code. To fix this, all I had to do was execute the start command in my code directory. I also fixed this by adding the lines:

  inputs = input_pairs os.system("ipcluster start -n 6") #NEW client = Client() ... lbview.map(parallel_run,inputs) os.system("ipcluster stop") #NEW 

who started the required cluster in the right place.

+4
source share
1 answer

This is basically a duplicate of the problems with the Python namespace with IPython.parallel , which contains a more detailed answer, but the gist:

When the client sends parallel_run to the engine, it simply sends this function, not the whole namespace in which the function is defined ( __main__ module). Therefore, when starting remote parallel_run search in math or os or dimensionless_run will look first in locals() (which is already defined in the function, i.e. in your built-in import), then in globals() , which is the __main__ module on the engine.

There are various approaches to ensuring that names are accessible on machines, but perhaps the simplest is to explicitly define / send them to them (the __main__ interactive namespace on machines, as well as locally in IPython)

 client[:].execute("import os, math") client[:]['dimensionless_run'] = dimensionless_run 

before launching your launch, in which case everything should work as you expect.

This is a problem unique to modules defined in interactive mode / in script. This does not work if this file is a module instead of a script, for example.

 from mymod import parallel_run lbview.map(parallel_run, inputs) 

In this case, globals() are global modules, which are usually the same everywhere.

+7
source

Source: https://habr.com/ru/post/1432762/


All Articles