Run the R model using SparkR

Thanks in advance for your input. I am new to ML. I developed the R model (using R-studio on my local computer) and want to deploy hadoop on the cluster where R Studio is installed. I want to use SparkR to use high performance computing. I just want to understand the role of SparkR here.

Will SparkR allow the R model to run the algorithm in Spark ML on a Hadoop cluster?

OR

Will SparkR only include data processing, and yet the ML algorithm will work in the context of R in the Hadoop cluster?

Rate your input.

+5
source share
1 answer

These are general questions, but in fact they have a very simple and clear answer: no (for both); SparkR does neither one nor the other.

In the Browse SparkR Docs section :

SparkR is an R package that provides an easy interface to use Apache Spark from R.

SparkR cannot even read native R-models.

The idea of ​​using SparkR for ML tasks is that you develop your model specifically in SparkR (and if you try, you will also find that it is much more limited compared to the many models available in R through various packages).

Even such amenities as, for example, confusionMatrix from caret are not available, since they work with R files and not with Spark (see this question and answer ).

0
source

Source: https://habr.com/ru/post/1273347/


All Articles