What are the steps involved in running the same python script in parallel in amazon ec2 or picloud

I need help with parallel processing, which I am trying to do as soon as possible.

It just involves splitting a large data array into smaller pieces and running the same script on each fragment.

I think this is called uncomfortably parallel.

I would be very grateful if someone out there could offer a template to achieve this, using either the amazon cloud services or picloud.

I made initial raids on amazon ec2 and picloud (the script I will run every piece of data in python), but understand that I can not figure out how to do this without any help.

So any pointers would be greatly appreciated. I'm just looking for basic help (for those who know), for example, the basic steps involved in setting up parallel cores or processors using ec2 or picloud or something else, using a script in parallel and saving the script, i.e. The script writes the result of its calculation to the csv file.

I am running ubuntu 12.04, my python 2.7 script does not include libraries without a stand, just os and csv. The script is not complicated, just the data is too large for my machine and timeframe.

+4
source share
1 answer

This script uses the cloud library for Python from PiCloud and must be run locally.

# chunks is a list of filenames (you'll need to define generate_chunk_files) chunks = generate_chunk_files('large_dataframe') for chunk in chunks: # stores each chunk in your PiCloud bucket cloud.bucket.put(chunk) def process_chunk(chunk): """Runs on PiCloud""" # saves chunk object locally cloud.bucket.get(chunk) f = open(chunk, 'r') # process the data however you want # asynchronously runs process_chunk on the cloud for all chunks job_ids = cloud.map(process_chunk, chunks) 

Use the Realtime Cores function to highlight a specific number of cores.

+4
source

Source: https://habr.com/ru/post/1445888/


All Articles