How do I set up an Amazon EMR streaming job to use EC2 place instances (Ruby CLI)?

When I create a streaming job with Amazon Elastic MapReduce (Amazon EMR) using the Ruby command line interface, how can I specify the use of only EC2 place instances (other than master)? The command below works, but it "forces" me to use the main instance in leasing 1 ...

./elastic-mapreduce --create --stream \ --name n2_3 \ --input s3://mr/neuron/2 \ --output s3://mr-out/neuron/2 \ --mapper s3://mr/map.rb \ --reducer s3://mr/noop_reduce.rb \ --instance-group master --instance-type m1.small --instance-count 1 \ --instance-group core --instance-type m1.small --instance-count 1 \ --instance-group task --instance-type m1.small --instance-count 18 --bid-price 0.028 

thanks

+4
source share
1 answer

Both CORE and TASKS nodes run TaskTrackers, but only CORE nodes run DataNodes, so yes, you need at least one CORE node.

So can you run point kernels?

 ./elastic-mapreduce --create --stream \ ... --instance-group master --instance-type m1.small --instance-count 1 \ --instance-group core --instance-type m1.small --instance-count 19 --bid-price 0.028 

ps you can also run one CORE and many TASK nodes, but depending on how much you read / write, you will have pain, since 18 nodes will read / write 1 node.

 # expect problems.... ./elastic-mapreduce --create --stream \ ... --instance-group master --instance-type m1.small --instance-count 1 \ --instance-group core --instance-type m1.small --instance-count 1 --bid-price 0.028 --instance-group task --instance-type m1.small --instance-count 18 --bid-price 0.028 
+7
source

Source: https://habr.com/ru/post/1396592/


All Articles