R Parallel processing with Xeon Phi, minimal code changes?

Looking to buy a pair of Xeon Phi 5110P, but trying to estimate how much code I need to change or other software is needed.

I am currently using R on a multi-core Windows machine (24 cores) using the foreach package, passing other forecast , glmnet , etc. glmnet it. to do my parallel processing,

Having Xeon Phi, I understand that I would like to compile R https://software.intel.com/en-us/articles/running-r-with-support-for-intel-xeon-phi-coprocessors And I understand that this can be done using the Parallel Studio XE version.

Then I need to edit the Makeconf R file by adding the C / C ++ flags for Phi too? Compile all the necessary packages before the trace on Parallel Studio expires? Or do I not need to edit Makeconf to get the benefits of foreach on Phi?

It seems that some of them will be processed automatically after R is compiled, with unloading performed by the Math Kernel library (MKL), but I'm not quite sure about that.

A related question: Is it possible to use Intel Xeon Phi without an expensive Intel compiler?

Revolutionanalytics.com also has several related blog posts, but not completely convincing for me: http://blog.revolutionanalytics.com/2015/05/behold-the-power-of-parallel.html

+5
source share
1 answer

If you only need matrix operations, you can compile them using MKL libraries: [Running R with Intelยฎ Xeon Phi โ„ข Coprocessors Support] [1], which requires Intel Complier. Microsoft R comes pre-compiled with MKL, but I could not use automatic unloading, I had to compile R with the Intel compiler in order for it to work correctly.

You can use the trial compiler and compile it during the trial period to make sure that it matches your purpose.

If you want to use things like foreach by setting up a cluster, since each node is a Linux computer, I'm afraid you're out of luck. On page 3 [R-Admin] [1] it says:

Cross-building is not possible: installing R builds a minimal version of R, and then runs a lot of R to complete the build.

You need to rewrite the compilation from the xeon host for the xeon phi node using the Intel compiler, and this is simply not possible.

The final way to use Phi is to rewrite the code to call it directly. Rcpp provides a simple interface for C and C ++ routines. If you find a C routine that works well on xeon, you can call the nodes inside your code. I did this with CUDA, and Rcpp is a thin layer, and there are good examples of how to use it, and if you join it with examples of calling phi-map nodes, you can probably achieve your goal at a lower cost.

BUt, if all you need is matrix operators, there is no faster route for a supercomputer than a good nvidea double-precision map and pre-loading nvBlas at the start of R.

0
source

Source: https://habr.com/ru/post/1243879/


All Articles