I am trying to enable YARN log aggregation for my Amazon EMR cluster. I follow this documentation for configuration:
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-debugging.html#emr-plan-debugging-logs-archive
In the section titled "Merging Journals in Amazon S3 Using the AWS CLI".
I checked that the hasoop-config bootstrap action puts the following in yarn-site.xml
<property><name>yarn.log-aggregation-enable</name><value>true</value></property> <property><name>yarn.log-aggregation.retain-seconds</name><value>-1</value></property> <property><name>yarn.log-aggregation.retain-check-interval-seconds</name><value>3000</value></property> <property><name>yarn.nodemanager.remote-app-log-dir</name><value>s3://mybucket/logs</value></property>
I can run the hadoop-examples.jar task ( pi from hadoop-examples.jar ) and see that it completed successfully in the ResourceManager GUI.
It even creates a folder under s3://mybucket/logs with the name of the application. But the folder is empty, and if I run yarn logs -applicationID <applicationId> , I get stacktrace:
14/10/20 23:02:15 INFO client.RMProxy: Connecting to ResourceManager at /10.XXX.XXX.XXX:9022 Exception in thread "main" org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3 at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:154) at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:242) at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:333) at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:330) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:330) at org.apache.hadoop.fs.FileContext.getFSofPath(FileContext.java:322) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:85) at org.apache.hadoop.fs.FileContext.listStatus(FileContext.java:1388) at org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:112) at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:137) at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:199)
It makes no sense to me; I can run hdfs dfs -ls s3://mybucket/ and it just displays the contents. Machines get credentials from AWS IAM roles, I tried adding fs.s3n.awsAccessKeyId etc. In core-site.xml no change in behavior.
Any advice is greatly appreciated.
source share