I suggest following the Cloudera tutorial Accessing data stored in Amazon S3 through Spark
To access the data stored in Amazon S3 by Spark applications, you can use the API files the Hadoop ( SparkContext.hadoopFile, JavaHadoopRDD.saveAsHadoopFile, SparkContext.newAPIHadoopRDDand JavaHadoopRDD.saveAsNewAPIHadoopFile) to read and write RDD, providing URL-address form s3a://bucket_name/path/to/file.txt.
You can read and write Spark SQL DataFrames using the Data Source API.
, .
(.. file.txt).
, S3, , , S3.
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectHEAD.html