S3 file systems are not included by default in Hadoop 2.6. Thus, Spark versions built using hadoop2.6 do not have any S3 based F3. Possible solutions:
Solution 1. Use Spark created with Hadoop 2.4 (just change the file name to βspark-1.5.1-bin-hadoop2.4.tgzβ and update sha256) and s3n: // fs will work.
Solution 2. Turn on the s3n: // file system. Set the option --conf spark.hadoop.fs.s3n.impl=org.apache.hadoop.fs.s3native.NativeS3FileSystemwhen starting the spark shell.
You must also set the path to the required libraries: --conf spark.driver.extraClassPath=<path>/* --conf spark.executor.extraClassPath=<path>/*where <path>is the c directory hadoop-aws, aws-java-sdk-1.7.4and guava-11.0.2jar's.
Solution 3. Use the new s3a: // file system. It is enabled by default. You must also set the path to the required libraries.
1: conf/spark-defaults.conf, --conf, .
2: <path> share/hadoop/tools/lib Hadoop 2.6+ (s3a Hadoop 2.7+) Maven Central (1, 2, 3).
3: s3n , ~/.aws/config --conf spark.hadoop.fs.s3n.awsAccessKeyId= --conf spark.hadoop.fs.s3n.awsSecretAccessKey=.
s3a --conf spark.hadoop.fs.s3a.access.key= --conf spark.hadoop.fs.s3a.secret.key= ( .aws).
4: s3:// s3n (--conf spark.hadoop.fs.s3.impl=org.apache.hadoop.fs.s3native.NativeS3FileSystem) s3a (--conf spark.hadoop.fs.s3.impl=org.apache.hadoop.fs.s3a.S3AFileSystem).
source
share