Multiprocess python reading process too much time

Question

Multiprocess python reading process too much time

in my code there is a function that should read the .each file about 8 M, however, the read speed is too low, and to improve that I use multiprocessor service. It seems to be blocked. I want to know if there are any ways to help solve this problem and improve read speed?

my code is as follows:

import multiprocessing as mp
import json
import os

def gainOneFile(filename):

    file_from = open(filename)
    json_str = file_from.read()
    temp = json.loads(json_str)
    print "load:",filename," len ",len(temp)
    file_from.close()
    return temp

def gainSortedArr(path):
    arr = []
    pool = mp.Pool(4)
    for i in xrange(1,40):
        abs_from_filename = os.path.join(path, "outputDict"+str(i))
        result = pool.apply_async(gainOneFile,(abs_from_filename,)) 
        arr.append(result.get())

    pool.close()
    pool.join()                                               
    arr = sorted(arr,key = lambda dic:len(dic))

    return arr

and call function:

whole_arr = gainSortedArr("sortKeyOut/")

+4

python

lizlalala Dec 18 '15 at 4:39

source share

1 answer

ShadowRanger · Accepted Answer · 2015-12-18T05:20:53+0000

You have a few issues. First of all, you are not parallelizing. You doing:

result = pool.apply_async(gainOneFile,(abs_from_filename,)) 
arr.append(result.get())

, , .get(), , - ; . .get(), .get() . Pool.map - , . ( imap_unordered, , ):

# Make generator of paths to load
paths = (os.path.join(path, "outputDict"+str(i)) for i in xrange(1, 40))
# Load them all in parallel, and sort the results by length (lambda is redundant)
arr = sorted(pool.imap_unordered(gainOneFile, paths), key=len)

-, multiprocessing , , , . , , , , .

, , ; import import multiprocessing.dummy as mp, Pool, ; CPython GIL, I/O, , , IPC, .

, Python 3.3 UNIX- , , , . , os.posix_fadvise (.fileno() ) WILLNEED SEQUENTIAL , - , , .

Multiprocess python reading process too much time

More articles: