Spark 2.0.0: import SparkR CSV

Question

Spark 2.0.0: import SparkR CSV

I am trying to read a csv file in SparkR (running Spark 2.0.0) and am trying to experiment with recently added features.

Using RStudio here.

I get an error while reading the source file.

My code is:

Sys.setenv(SPARK_HOME = "C:/spark-2.0.0-bin-hadoop2.6")
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
sparkR.session(master = "local[*]", appName = "SparkR")
df <- loadDF("F:/file.csv", "csv", header = "true")

I get an error in the loadDF function.

Error:

loadDF("F:/file.csv", "csv", header = "true")

invokeJava (isStatic = TRUE, className, methodName,...): java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0 ( ) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala: 258) at org.apache.spark.sql.hive.HiveUtils $.newClientForMetadata(HiveUtils.scala: 359) at org.apache.spark.sql.hive.HiveUtils $.newClientForMetadata(HiveUtils.scala: 263) at org.apache.spark.sql.hive.HiveSharedState.metadataHive $lzycompute (HiveSharedState.scala: 39) at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala: 38) at org.apache.spark.sql.hive.HiveSharedState.externalCatalog $lzycompute (HiveSharedState.scala: 46) org.apache.spark.sql.hive.HiveSharedSt

? .

+4

csv apache-spark spark-dataframe sparkr

AC24 29 . '16 12:37

2

, CSV

https://github.com/databricks/spark-csv

Sys.setenv(SPARK_HOME = "C:/spark-2.0.0-bin-hadoop2.6")

library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))

sparkR.session(master = "local[*]", appName = "SparkR")  

Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.4.0" "sparkr-shell"')

sqlContext <- sparkRSQL.init(sc)

df <- read.df(sqlContext, "cars.csv", source = "com.databricks.spark.csv", inferSchema = "true")

0

Erick Díaz 02 . '16 20:31

Yury Arrow · Accepted Answer · 2016-08-04T12:04:24+0000

.

createDataFrame(iris)

, - ?

UPD. ! .

: Apache Spark MLlib API DataFrame java.net.URISyntaxException createDataFrame() read(). csv (...)

R :

sparkR.session(sparkConfig = list(spark.sql.warehouse.dir="/file:C:/temp"))

Spark 2.0.0: import SparkR CSV

More articles: