I am trying to execute spark-submit using boto3 client for EMR. After executing the code below, the EMR step is presented and after a few seconds failed. The actual command line from the step logs works if executed manually on the main EMR server.
The controller log shows little readable garbage, similar to several processes written there simultaneously.
UPD: Tried command-runner.jar and EMR versions 4.0.0 and 4.1.0
Any idea appreciated.
Code snippet:
class ProblemExample: def run(self): session = boto3.Session(profile_name='emr-profile') client = session.client('emr') response = client.add_job_flow_steps( JobFlowId=cluster_id, Steps=[ { 'Name': 'string', 'ActionOnFailure': 'CONTINUE', 'HadoopJarStep': { 'Jar': 's3n://elasticmapreduce/libs/script-runner/script-runner.jar', 'Args': [ '/usr/bin/spark-submit', '--verbose', '--class', 'my.spark.job', '--jars', '<dependencies>', '<my spark job>.jar' ] } }, ] )
source share