Amazon Elastic Map Zoom Out - Keep Server Alive?

I test jobs in EMR, and each test takes a long time to run. Is there a way to keep the server / master node alive in Amazon EMR? I know that this can be done using the API. But I wanted to know if this can be done in the aws console?

+4
source share
3 answers

You cannot do this from the AWS console. To quote the developer guide

The Amazon Elastic MapReduce tab in the AWS management console does not support adding steps to the job stream.

You can only do this through the CLI and API by creating a workflow and then adding steps to it.

$ ./elastic-mapreduce --create --active --stream 
+2
source

You cannot do this using the web console, but with the help of the API and programming tools you can add a few steps to the long work, and this is what I do. Thus, you can run tasks one after another in the same multi-year cluster without the need to re-create a new one.

If you are familiar with Python, I highly recommend the Boto library. Other AWS API tools also allow you to do this.

If you are following the Boto EMR study guide , you will find a few examples:

Just to give you an idea, this is what I am doing (with streaming jobs):

 # Connect to EMR conn = boto.connect_emr() # Start long-running job, don't forget keep_alive setting jobid = conn.run_jobflow(name='My jobflow', log_uri='s3://<my log uri>/jobflow_logs', keep_alive=True) # Create your streaming job step = StreamingStep(...) # Add the step to the job conn.add_jobflow_steps(jobid, [step]) # Wait till its complete while True: state = conn.describe_jobflow(jobid).steps[-1].state if (state == "COMPLETED"): break if (state == "FAILED") or (state == "TERMINATED") or (state == "CANCELLED"): print >> sys.stderr, ("EMR job failed! Message = %s!") % (state) sys.exit(1) time.sleep (60) # Create your next job here and add it to the EMR cluster step = StreamingStep(...) conn.add_jobflow_steps(jobid, [step]) # Repeat :) 
+1
source

To revitalize the car, start an interactive pig session. Then the car will not turn off. Then you can execute your map / reduce logic from the command line using:

 cat infile.txt | yourMapper | sort | yourReducer > outfile.txt 
0
source

Source: https://habr.com/ru/post/1305058/


All Articles