I would like to connect a local RStudio working session to a remote spark session via sparklyr. When you go on to add a new connection on the sparklyr ui tab in RStudio, and select that the cluster says that you should work in the cluster or have a high-speed connection to the cluster.
Can anyone shed some light on how to create such a connection? I'm not sure how to create a reproducible example of this, but overall what I would like to do is:
library(sparklyr) sc <- spark_connect(master = "spark://ip-[MY_PRIVATE_IP]:7077", spark_home = "/home/ubuntu/spark-2.0.0", version="2.0.0")
from a remote server. I understand that there will be latency, especially if you are trying to transfer data between remotes. I also understand that it would be better to have an rstudio server in a real cluster, but this is not always possible, and I am looking for the sparklyr option for interaction between my server and my working RStudio session. Thank you
source share