Python multiprocessing (joblib) is the best way to pass arguments

I noticed a huge delay when using multiprocessing (with joblib). Here is a simplified version of my code:

import numpy as np
from joblib import Parallel, delayed

class Matcher(object):
    def match_all(self, arr1, arr2):
        args = ((elem1, elem2) for elem1 in arr1 for elem2 in arr2)

        results = Parallel(n_jobs=-1)(delayed(_parallel_match)(self, e1, e2) for e1, e2 in args)
        # ...

    def match(self, i1, i2):
        return i1 == i2

def _parallel_match(m, i1, i2):
    return m.match(i1, i2)

matcher = Matcher()
matcher.match_all(np.ones(250), np.ones(250))

So, if I ran it, as shown above, it would take about 30 seconds to complete and use almost 200 MB. If I just change the n_jobs parameter in Parallel and set it to 1, it takes only 1.80 seconds and hardly uses 50Mb ...

I suppose this should be due to the way I pass the arguments, but could not find a better way to do this ...

I am using Python 2.7.9

+4
source share
1 answer

joblib, , , "" :

import itertools
import multiprocessing
import numpy as np


class Matcher(object):
    def match_all(self, a1, a2):
        args = ((elem1, elem2) for elem1 in a1 for elem2 in a2)
        args = zip(itertools.repeat(self), args)

        pool = multiprocessing.Pool()
        results = np.fromiter(pool.map(_parallel_match, args))
        # ...

    def match(self, i1, i2):
        return i1 == i2

def _parallel_match(*args):
    return args[0][0].match(*args[0][1:][0])

matcher = Matcher() 
matcher.match_all(np.ones(250), np.ones(250))

0,58 ...

, joblib? , , joblib ...

+5

Source: https://habr.com/ru/post/1584726/


All Articles