How to set the path to the EMR class

I am starting work in an AWS EMR cluster and I am having problems with the Jackson library conflict. Based on the article here , I tried to add a bootstrap step to set the classpath with the following script:

#!/bin/bash
export HADOOP_USER_CLASSPATH_FIRST=true;
echo "HADOOP_CLASSPATH=s3n://bucket/myjar.jar" > /home/hadoop/conf/hadoop-user-env.sh

I built my jar so that all its dependencies were included in it. The first problem that arises when I do this is that my debug step, which ends with me, with the following error:

Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2427)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2440)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2479)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2461)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:372)
at com.amazon.elasticmapreduce.scriptrunner.ScriptRunner.fetchFile(ScriptRunner.java:39)
at com.amazon.elasticmapreduce.scriptrunner.ScriptRunner.main(ScriptRunner.java:56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
... 13 more

So, I have two questions: what is connected with this and with the stage of enabling debugging? Is my classpath really a s3 place? If not, then what value should be:

/path/to/my.jar

in the example on the page above?

+4
1

-, , . :

#!/bin/bash
export HADOOP_USER_CLASSPATH_FIRST=true
echo "HADOOP_CLASSPATH=/path/to/my.jar" >> /home/hadoop/conf/hadoop-user-env.sh

. " > " , "", " → " , script. , Bash script.

: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-config_hadoop-user-env.sh.html

PS: Amazon ; . , - AWS Rendy O.

+3

Source: https://habr.com/ru/post/1568068/


All Articles