I am trying to access an S3 file from a SparkSQL job. I have already tried several post solutions, but nothing works. Perhaps because my EC2 cluster is launching a new Spark2.0 for Hadoop2.7.
I setup hadoop as follows:
sc.hadoopConfiguration.set("fs.s3a.impl","org.apache.hadoop.fs.s3a.S3AFileSystem")
sc.hadoopConfiguration.set("fs.s3a.awsAccessKeyId", accessKey)
sc.hadoopConfiguration.set("fs.s3a.awsSecretAccessKey", secretKey)
I am creating uber-jar using sbt build using:
name := "test"
version := "0.2.0"
scalaVersion := "2.11.8"
libraryDependencies += "com.amazonaws" % "aws-java-sdk" % "1.7.4"
libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "2.7.3" excludeAll(
ExclusionRule("com.amazonaws", "aws-java-sdk"),
ExclusionRule("commons-beanutils")
)
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0" % "provided"
When I submit my work to the cluster, I always get the following errors:
"main" org.apache.spark.SparkException: Job - : 0 0.0 4 , : 0,3 0.0 (TID 6, 172.31.7.246): java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.fs.s3a.S3AFileSystem org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2638) org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651) at org.apache.hadoop.fs.FileSystem.access $200 (FileSystem.java:92) org.apache.hadoop.fs.FileSystem $Cache.getInternal(FileSystem.java:2687) at org.apache.hadoop.fs.FileSystem $Cache.get(FileSystem.java:2669) org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) at org.apache.spark.util.Utils $.getHadoopFileSystem(Utils.scala: 1726) at org.apache.spark.util.Utils $.doFetchFile(Utils.scala: 662) at org.apache.spark.util.Utils $.fetchFile(Utils.scala: 446) at org.apache.spark.executor.Executor $$ anonfun $ $ $ $$ $$ updateDependencies $3.Apply(Executor.scala: 476)
, S3 , /... , uberjar .
, spark-submit :
- com.amazonaws: aws-java-sdk: 1.7.4, org.apache.hadoop: hadoop-aws: 2.7.3
PS: s3n, :
java.io.IOException: No FileSystem : s3n