This should cover # 1 if you are using pyspark:
sc._jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", "MY-ACCESS-KEY")
sc._jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", "MY-SECRET-ACCESS-KEY")
my_data = sc.textFile("s3n://my-bucket-name/my-key")
my_data.count()
my_data.take(20)
import json
my_data.map(lambda x: json.loads(x)).take(20)
, s3 s3n://, s3://. .
, my-key S3 *. , , .
# 2 № 3 . s3:
my_data.map(lambda x: json.dumps(x)).saveAsTextFile('s3://my-bucket-name/my-new-key')
, , S3 .
* S3 , , .