Moving Parallel Python Code to the Cloud

Hearing that a scientific computing project (it turns out that this is the method of stochastic tractography described here ) Iโ€™m now for an investigator for 4 months on our 50 node cluster, the investigator asked me to explore other options. Currently, the project uses parallel python to ferment 4d array blocks for different cluster nodes and add processed pieces.

The works I'm currently working with are probably too roughly cut (from 5 seconds to 10 minutes, I had to increase the default timeout in parallel python), and I believe that I can speed up the process by 2-4 times rewriting it in order to use resources better (splitting and re-combining data takes too much time, which also needs to be parallelized). Most of the work is done using numpy arrays.

Suppose 2-4 times is not enough, and I decided to remove the code from our local equipment. For high-performance computing like this, what are my commercial options and how do I need to change the code?

+4
source share
2 answers

The most obvious commercial options that come to mind are Amazon EC2 and Rackspace Cloud. I played with both and found the Rackspace API a bit easier to use.

The good news is that you can prototype and play with their computing instances (short-term or long-lived virtual machines of your choice) for a very small investment, usually $ 0.10 per hour or so. You create them on demand, and then release them back into the cloud when you're done, and pay only for what you use. For example, I saw a demonstration of Django deployment using 6 instances of Rackspace, which probably took an hour and cost speakers less than one dollar.

In your use case (it is not clear what exactly you mean by โ€œhigh bandwidthโ€), you will have to look at your budget and your computing needs, as well as the overall network bandwidth (you also pay for this). A few small tests and a simple spreadsheet calculation should tell you whether this is practical or not.

There are Python APIs for Rackspace Cloud and Amazon EC2. Depending on what you use, I recommend python-based Fabric to automatically deploy and configure your instances.

+1
source

You may be interested in PiCloud . I have never used it, but their proposal seems to include Enthought Python Distribution , which covers standard science libraries.

It's hard to say if this will work for your specific case, but the Parallel Python interface is pretty general. Therefore, I hope that not too many changes are required. Perhaps you can even write your own scheduler class (implementing the same interface as PP). In fact, it can be useful for many people, so maybe you can get support on the PP forum.

+6
source

Source: https://habr.com/ru/post/1339119/


All Articles