Unable to read from s3 bucket using spark

Question

Unable to read from s3 bucket using spark

val spark = SparkSession
        .builder()
        .appName("try1")
        .master("local")
        .getOrCreate()

val df = spark.read
        .json("s3n://BUCKET-NAME/FOLDER/FILE.json")
        .select($"uid").show(5)

I passed AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY as environment variables. I am encountering an error while reading from S3.

Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/FOLDER%2FFILE.json' - ResponseCode=400, ResponseMessage=Bad Request

I suspect that the error is caused by the fact that "/" is converted to "% 2F" by some internal function, since the error shows "/FOLDER%2FFILE.json" instead of "/FOLDER/FILE.json '

+2

scala amazon-s3 amazon-web-services apache-spark apache-spark-sql

san8055 Jun 16 '17 at 12:43

source share

1 answer

eliasah · Answer 1 · 2017-06-16T13:11:42+0000

Your candlestick app (jvm) cannot read the environment variable unless you report it, so quickly do the following:

spark.sparkContext
     .hadoopConfiguration.set("fs.s3n.awsAccessKeyId", awsAccessKeyId)
spark.sparkContext
     .hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", awsSecretAccessKey)

You will also need to clarify the endpoint s3:

spark.sparkContext
     .hadoopConfiguration.set("fs.s3a.endpoint", "<<ENDPOINT>>");

AWS S3, :

Unable to read from s3 bucket using spark

More articles: