I am trying to do some expensive scientific computing using Python. I have to read a bunch of data stored in csv files and then process them. Since each process takes a lot of time and I have 8 processors, I tried to use the Pool method from Multiprocessing .
Here's how I structured a multiprocessor call:
pool = Pool() vector_components = [] for sample in range(samples): vector_field_x_i = vector_field_samples_x[sample] vector_field_y_i = vector_field_samples_y[sample] vector_component = pool.apply_async(vector_field_decomposer, args=(x_dim, y_dim, x_steps, y_steps, vector_field_x_i, vector_field_y_i)) vector_components.append(vector_component) pool.close() pool.join() vector_components = map(lambda k: k.get(), vector_components) for vector_component in vector_components: CsvH.write_vector_field(vector_component, '../CSV/RotationalFree/rotational_free_x_'+str(sample)+'.csv')
I performed a dataset of 500 samples of size 100 ( x_dim ) per 100 ( y_dim ).
Until then, everything worked fine.
Then I get a dataset of 500 400 x 400 samples.
When I start, I get an error when calling get .
I also tried to run one 400 x 400 sample and got the same error.
Traceback (most recent call last): File "__init__.py", line 33, in <module> VfD.samples_vector_field_decomposer(samples, x_dim, y_dim, x_steps, y_steps, vector_field_samples_x, vector_field_samples_y) File "/export/home/pceccon/VectorFieldDecomposer/Sources/Controllers/VectorFieldDecomposerController.py", line 43, in samples_vector_field_decomposer vector_components = map(lambda k: k.get(), vector_components) File "/export/home/pceccon/VectorFieldDecomposer/Sources/Controllers/VectorFieldDecomposerController.py", line 43, in <lambda> vector_components = map(lambda k: k.get(), vector_components) File "/export/home/pceccon/.pyenv/versions/2.7.5/lib/python2.7/multiprocessing/pool.py", line 554, in get raise self._value MemoryError
What should I do?
Thanks in advance.