YARN Log Aggregation on AWS EMR - UnsupportedFileSystemException

I am trying to enable YARN log aggregation for my Amazon EMR cluster. I follow this documentation for configuration:

http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-debugging.html#emr-plan-debugging-logs-archive

In the section titled "Merging Journals in Amazon S3 Using the AWS CLI".

I checked that the hasoop-config bootstrap action puts the following in yarn-site.xml

 <property><name>yarn.log-aggregation-enable</name><value>true</value></property> <property><name>yarn.log-aggregation.retain-seconds</name><value>-1</value></property> <property><name>yarn.log-aggregation.retain-check-interval-seconds</name><value>3000</value></property> <property><name>yarn.nodemanager.remote-app-log-dir</name><value>s3://mybucket/logs</value></property> 

I can run the hadoop-examples.jar task ( pi from hadoop-examples.jar ) and see that it completed successfully in the ResourceManager GUI.

It even creates a folder under s3://mybucket/logs with the name of the application. But the folder is empty, and if I run yarn logs -applicationID <applicationId> , I get stacktrace:

 14/10/20 23:02:15 INFO client.RMProxy: Connecting to ResourceManager at /10.XXX.XXX.XXX:9022 Exception in thread "main" org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3 at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:154) at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:242) at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:333) at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:330) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:330) at org.apache.hadoop.fs.FileContext.getFSofPath(FileContext.java:322) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:85) at org.apache.hadoop.fs.FileContext.listStatus(FileContext.java:1388) at org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:112) at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:137) at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:199) 

It makes no sense to me; I can run hdfs dfs -ls s3://mybucket/ and it just displays the contents. Machines get credentials from AWS IAM roles, I tried adding fs.s3n.awsAccessKeyId etc. In core-site.xml no change in behavior.

Any advice is greatly appreciated.

+5
source share
1 answer

Hadoop provides two fs interfaces - FileSystem and AbstractFileSystem . In most cases, we work with FileSystem and use configuration options such as fs.s3.impl to provide custom adapters.

yarn logs , however, uses the AbstractFileSystem interface.

If you can find an implementation for S3, you can specify it with fs.AbstractFileSystem.s3.impl .

See core-default.xml for examples fs.AbstractFileSystem.hdfs.impl , etc.

+5
source

Source: https://habr.com/ru/post/1205142/


All Articles