Spark-submit EMP Failure step when sending using boto3 client

I am trying to execute spark-submit using boto3 client for EMR. After executing the code below, the EMR step is presented and after a few seconds failed. The actual command line from the step logs works if executed manually on the main EMR server.

The controller log shows little readable garbage, similar to several processes written there simultaneously.

UPD: Tried command-runner.jar and EMR versions 4.0.0 and 4.1.0

Any idea appreciated.

Code snippet:

class ProblemExample: def run(self): session = boto3.Session(profile_name='emr-profile') client = session.client('emr') response = client.add_job_flow_steps( JobFlowId=cluster_id, Steps=[ { 'Name': 'string', 'ActionOnFailure': 'CONTINUE', 'HadoopJarStep': { 'Jar': 's3n://elasticmapreduce/libs/script-runner/script-runner.jar', 'Args': [ '/usr/bin/spark-submit', '--verbose', '--class', 'my.spark.job', '--jars', '<dependencies>', '<my spark job>.jar' ] } }, ] ) 
+5
source share
1 answer

Finally, the problem is solved by evacuating the -jars values โ€‹โ€‹correctly.

spark-submit could not find the classes, but amid dirty logs the error is not clear.

Valid example:

 class Example: def run(self): session = boto3.Session(profile_name='emr-profile') client = session.client('emr') response = client.add_job_flow_steps( JobFlowId=cluster_id, Steps=[ { 'Name': 'string', 'ActionOnFailure': 'CONTINUE', 'HadoopJarStep': { 'Jar': 'command-runner.jar', 'Args': [ '/usr/bin/spark-submit', '--verbose', '--class', 'my.spark.job', '--jars', '\'<coma, separated, dependencies>\'', '<my spark job>.jar' ] } }, ] ) 
+7
source

Source: https://habr.com/ru/post/1234393/


All Articles