Parallelize nested loops in IPython

I have a nested for loop in my Python code that looks something like this:

results = [] for azimuth in azimuths: for zenith in zeniths: # Do various bits of stuff # Eventually get a result results.append(result) 

I would like to parallelize this loop on my 4-core computer in order to speed it up. Looking at the IPython parallel programming documentation (http://ipython.org/ipython-doc/dev/parallel/parallel_multiengine.html#quick-and-easy- parallelism), there seems to be an easy way to use map to parallel iterate operations.

However, for this I need to have the code inside the loop as a function (which is easy to do), and then map this function. The problem is that I cannot get an array to map this function. itertools.product() creates an iterator that apparently cannot use the display function c.

Am I barking the wrong tree trying to use the map here? Is there a better way to do this? Or is there a way to use itertools.product and then do parallel execution using the function displayed by the results?

+6
source share
5 answers

To parallelize each call, you just need to get a list for each argument. You can use itertools.product + zip to get the following:

 allzeniths, allazimuths = zip(*itertools.product(zeniths, azimuths)) 

Then you can use the map:

 amr = dview.map(f, allzeniths, allazimuths) 

To go deeper into the steps, here is an example:

 zeniths = range(1,4) azimuths = range(6,8) product = list(itertools.product(zeniths, azimuths)) # [(1, 6), (1, 7), (2, 6), (2, 7), (3, 6), (3, 7)] 

So, we have a β€œlist of pairs”, but we really need a single list for each argument, i.e. "a couple of lists." This is exactly what gives us the slightly strange zip(*product) syntax:

 allzeniths, allazimuths = zip(*itertools.product(zeniths, azimuths)) print allzeniths # (1, 1, 2, 2, 3, 3) print allazimuths # (6, 7, 6, 7, 6, 7) 

Now we simply map our function to these two lists in order to parallelize the nested for loops:

 def f(z,a): return z*a view.map(f, allzeniths, allazimuths) 

And there is nothing special in that there are only two - this method should extend to an arbitrary number of nested loops.

+10
source

I assume that you are using IPython 0.11 or later. First of all, we define a simple function.

 def foo(azimuth, zenith): # Do various bits of stuff # Eventually get a result return result 

then use the IPython parallel package to parallelize your problem. first start the controller with 5 connected engines (#CPUs + 1) by starting the cluster in the terminal window (if you installed IPython 0.11 or later, this program should be present):

 ipcluster start -n 5 

In your script, connect to the controller and pass all your tasks. The controller takes care of everything.

 from IPython.parallel import Client c = Client() # here is where the client establishes the connection lv = c.load_balanced_view() # this object represents the engines (workers) tasks = [] for azimuth in azimuths: for zenith in zeniths: tasks.append(lv.apply(foo, azimuth, zenith)) result = [task.get() for task in tasks] # blocks until all results are back 
+9
source

I am not very familiar with IPython, but a simple solution seems to be to only parallelize the outer loop.

 def f(azimuth): results = [] for zenith in zeniths: #compute result results.append(result) return results allresults = map(f, azimuths) 
+1
source

If you really want to run your code in parallel, use concurrent.futures

 import itertools import concurrent.futures def _work_horse(azimuth, zenith): #DO HEAVY WORK HERE return result futures = [] with concurrent.futures.ProcessPoolExecutor() as executor: for arg_set in itertools.product(zeniths, azimuths): futures.append(executor.submit(_work_horse, *arg_set)) executor.shutdown(wait=True) # Will time out after one hour. results = [future.result(3600) for future in futures] 
+1
source

If you want to maintain the structure of your loop, you can try using Ray ( docs ), which is the basis for writing parallel and distributed Python. The only requirement is that you must isolate the work that can be parallelized into its own function.

You can import Ray as follows:

 import ray # Start Ray. This creates some processes that can do work in parallel. ray.init() 

Then your script will look like this:

 # Add this line to signify that the function can be run in parallel (as a # "task"). Ray will load-balance different 'work' tasks automatically. @ray.remote def work(azimuth, zenith): # Do various bits of stuff # Eventually get a result return result results = [] for azimuth in azimuths: for zenith in zeniths: # Store a future, which represents the future result of 'work'. results.append(work.remote(azimuth, zenith)) # Block until the results are ready with 'ray.get'. results = ray.get(results) 
0
source

Source: https://habr.com/ru/post/908899/


All Articles