Brew installed apache-spark cannot access s3 files

After brew install apache-spark, sc.textFile("s3n://...")in spark-shella failure with java.io.IOException: No FileSystem for scheme: s3n. This does not apply to spark-shellaccessed through an EC2 machine running with spark-ec2. The homebrew formula is created with a fairly late version of Hadoop, and this error occurs regardless of whether the first run was launched brew install hadoop.

How to set a spark using homebrew so that it can read s3n://files?

+4
source share
1 answer

S3 file systems are not included by default in Hadoop 2.6. Thus, Spark versions built using hadoop2.6 do not have any S3 based F3. Possible solutions:

  • Solution 1. Use Spark created with Hadoop 2.4 (just change the file name to β€œspark-1.5.1-bin-hadoop2.4.tgz” and update sha256) and s3n: // fs will work.

  • Solution 2. Turn on the s3n: // file system. Set the option --conf spark.hadoop.fs.s3n.impl=org.apache.hadoop.fs.s3native.NativeS3FileSystemwhen starting the spark shell.

    You must also set the path to the required libraries: --conf spark.driver.extraClassPath=<path>/* --conf spark.executor.extraClassPath=<path>/*where <path>is the c directory hadoop-aws, aws-java-sdk-1.7.4and guava-11.0.2jar's.

  • Solution 3. Use the new s3a: // file system. It is enabled by default. You must also set the path to the required libraries.

1: conf/spark-defaults.conf, --conf, .

2: <path> share/hadoop/tools/lib Hadoop 2.6+ (s3a Hadoop 2.7+) Maven Central (1, 2, 3).

3: s3n , ~/.aws/config --conf spark.hadoop.fs.s3n.awsAccessKeyId= --conf spark.hadoop.fs.s3n.awsSecretAccessKey=.

s3a --conf spark.hadoop.fs.s3a.access.key= --conf spark.hadoop.fs.s3a.secret.key= ( .aws).

4: s3:// s3n (--conf spark.hadoop.fs.s3.impl=org.apache.hadoop.fs.s3native.NativeS3FileSystem) s3a (--conf spark.hadoop.fs.s3.impl=org.apache.hadoop.fs.s3a.S3AFileSystem).

+3
source

Source: https://habr.com/ru/post/1614790/


All Articles