Parallel loop in a few lines

I need to apply the same function to each row in a numpy array and save the result in a numpy array again.

# states will contain results of function applied to a row in array
states = np.empty_like(array)

for i, ar in enumerate(array):
    states[i] = function(ar, *args)

# do some other stuff on states

functionperforms some non-trivial filtering of my data and returns an array when the conditions are True as well as False. functioncan be either pure python or cython compiled. Filtering operations in rows are complex and may depend on previous values ​​in the row, which means that I cannot work with the entire array in stages.

Is there a way to do something like this in dask, for example?

+4
source share
2 answers

Dask Solution

dask.array, , map_blocks,

ar = ...
x = da.from_array(ar, chunks=(1, arr.shape[1]))
x.map_blocks(function, *args)
states = x.compute()

,

from dask.multiprocessing import get
states = x.compute(get=get)

dask, , , , threadpool

from multiprocessing.pool import ThreadPool
pool = ThreadPool()

ar = ...
states = np.empty_like(array)

def f(i):
    states[i] = function(ar[i], *args)

pool.map(f, range(len(ar)))

from multiprocessing import Pool
pool = Pool()
+3

: http://docs.scipy.org/doc/numpy/reference/ufuncs.html.

: states = function(array, *args).

0

Source: https://habr.com/ru/post/1609238/


All Articles