Connect sparks to remote spark connection

I would like to connect a local RStudio working session to a remote spark session via sparklyr. When you go on to add a new connection on the sparklyr ui tab in RStudio, and select that the cluster says that you should work in the cluster or have a high-speed connection to the cluster.

Can anyone shed some light on how to create such a connection? I'm not sure how to create a reproducible example of this, but overall what I would like to do is:

library(sparklyr) sc <- spark_connect(master = "spark://ip-[MY_PRIVATE_IP]:7077", spark_home = "/home/ubuntu/spark-2.0.0", version="2.0.0") 

from a remote server. I understand that there will be latency, especially if you are trying to transfer data between remotes. I also understand that it would be better to have an rstudio server in a real cluster, but this is not always possible, and I am looking for the sparklyr option for interaction between my server and my working RStudio session. Thank you

+6
source share
3 answers

Like sparklyr version 0.4 , connection from the RStudio desktop to a remote Spark cluster is not supported. Instead, as you mentioned, it is recommended that you use RStudio Server in a Spark cluster.

Nonetheless,

+7
source

Using a more recent version of sparklyr (for example, version 0.9.2 ), you can connect to a remote Spark cluster.

Here is an example of connecting to a stand-alone Spark cluster version 2.3.1 . See Master URLs for other basic URL patterns.

 #install.packages("sparklyr") library(sparklyr) # You have to install locally (on the driver where RStudio is running) the same Spark version spark_v <- "2.3.1" cat("Installing Spark in the directory:", spark_install_dir()) spark_install(version = spark_v) sc <- spark_connect(spark_home = spark_install_find(version=spark_v)$sparkVersionDir, master = "spark://ip-[MY_PRIVATE_IP]:7077") sc$master # "spark://ip-[MY_PRIVATE_IP]:7077" 

I wrote a post on this topic.

+1
source

I finally managed to connect local R to the cloud instance of the Spark cluster (in my case HD Insights) using Livy

inside sparklyr spark_connect there is an option to connect to livy. (Method = " Livy ")

 sc <- spark_connect(master = "https://<clustername>.azurehdinsight.net/livy/", method = "livy", config = livy_config( username = "<admin>", password = rstudioapi::askForPassword("Livy password:"))) 
0
source

Source: https://habr.com/ru/post/1257551/


All Articles