I am trying to implement a module multiprocessingto work with a large CSV file. I am using Python 2.7 and follow the example here .
I ran unmodified code (copied below for convenience) and noticed that the statements printinside the function workerdo not work. Failure printmakes understanding flow and debugging difficult.
Can someone explain why printit doesn't work here? Can pool.map execute print commands? I searched the Internet, but did not find any documentation that would indicate this.
import multiprocessing as mp
import itertools
import time
import csv
def worker(chunk):
print(chunk)
print 'working'
return len(chunk)
def keyfunc(row):
return row[0]
def main():
pool = mp.Pool()
largefile = 'test.dat'
num_chunks = 10
results = []
with open(largefile) as f:
reader = csv.reader(f)
chunks = itertools.groupby(reader, keyfunc)
while True:
groups = [list(chunk) for key, chunk in
itertools.islice(chunks, num_chunks)]
if groups:
result = pool.map(worker, groups)
results.extend(result)
else:
break
pool.close()
pool.join()
print(results)
if __name__ == '__main__':
main()
source
share