Take a look at boto3 EMR docs to create a cluster. You essentially need to call run_job_flow and create steps that run the program you need.
import boto3 client = boto3.client('emr', region_name='us-east-1') S3_BUCKET = 'MyS3Bucket' S3_KEY = 'spark/main.py' S3_URI = 's3://{bucket}/{key}'.format(bucket=S3_BUCKET, key=S3_KEY)
You can also add steps to a running cluster if you know the job flow identifier:
job_flow_id = response['JobFlowId'] print("Job flow ID:", job_flow_id) step_response = client.add_job_flow_steps(JobFlowId=job_flow_id, Steps=SomeMoreSteps) step_ids = step_response['StepIds'] print("Step IDs:", step_ids)
For more configurations, check out sparksteps .
source share