Specifying other custom S3 codes in EMR job streams

Question

Specifying other custom S3 codes in EMR job streams

I am trying to use the S3 bucket as input for my Elastic Map Reduce job project. S3 bucket does not belong to the same account as the EMR job stream. How and where should the S3 bucket credentials be specified to access the corresponding S3 bucket. I tried the following format:

s3n://<Access Key>:<Secret Key>@<BUCKET>

But this gives me the following error:

 Exception in thread "main" java.lang.IllegalArgumentException: The bucket name parameter must be specified when listing objects in a bucket at com.amazonaws.services.s3.AmazonS3Client.assertParameterNotNull(AmazonS3Client.java:2381) at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:444) at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:785) at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.ensureBucketExists(Jets3tNativeFileSystemStore.java:80) at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:71) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:83) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at org.apache.hadoop.fs.s3native.$Proxy1.initialize(Unknown Source) at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:512) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1413) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:68) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1431) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:256) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:352) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:321) at com.inmobi.appengage.emr.mapreduce.TestSession.main(TestSession.java:88) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:187)

How do I define the same?

+4

amazon-s3 amazon-web-services elastic-map-reduce amazon-emr

Abhishek jain Aug 23 '13 at 9:44

source share

1 answer

Amar · Accepted Answer · 2013-08-23T21:00:12+0000

You should try adding these credentials to the core-site.xml file. You can add s3 credentials manually in the nodes or using the boostrap action when starting the cluster.

You can start the cluster with something like this:

ruby elastic-mapreduce --create --alive -plain-output --master-instance-type m1.xlarge --slave-instance-type m1.xlarge --num instances 11 - name "My Super Cluster" - -bootstrap-action s3: // elasticmapreduce / bootstrap-actions / configure -hadoop -args -c, fs.s3.awsAccessKeyId = <access-key>, - c, fs.s3.awsSecretAccessKey = <secret key>

This should override the default values that EMR places under the account that runs the cluster.

Specifying other custom S3 codes in EMR job streams

More articles: