Spark-ec2 call from EC2 instance: ssh host connection refused

Question

Spark-ec2 call from EC2 instance: ssh host connection refused

To complete the Amplab exercises, I created a key pair on us-east-1 , installed training scripts ( git clone git://github.com/amplab/training-scripts.git -b ampcamp4 ) and created env. variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY following the instructions in http://ampcamp.berkeley.edu/big-data-mini-course/launching-a-bdas-cluster-on-ec2.html

Now running

  ./spark-ec2 -i ~/.ssh/myspark.pem -r us-east-1 -k myspark --copy launch try1

generates the following messages:

 johndoe@ip-some-instance :~/projects/spark/training-scripts$ ./spark-ec2 -i ~/.ssh/myspark.pem -r us-east-1 -k myspark --copy launch try1 Setting up security groups... Searching for existing cluster try1... Latest Spark AMI: ami-19474270 Launching instances... Launched 5 slaves in us-east-1b, regid = r-0c5e5ee3 Launched master in us-east-1b, regid = r-316060de Waiting for instances to start up... Waiting 120 more seconds... Copying SSH key /home/johndoe/.ssh/myspark.pem to master... ssh: connect to host ec2-54-90-57-174.compute-1.amazonaws.com port 22: Connection refused Error connecting to host Command 'ssh -t -o StrictHostKeyChecking=no -i /home/johndoe/.ssh/myspark.pem root@ec2-54-90-57-174.compute-1.amazonaws.com 'mkdir -p ~/.ssh'' returned non-zero exit status 255, sleeping 30 ssh: connect to host ec2-54-90-57-174.compute-1.amazonaws.com port 22: Connection refused Error connecting to host Command 'ssh -t -o StrictHostKeyChecking=no -i /home/johndoe/.ssh/myspark.pem root@ec2-54-90-57-174.compute-1.amazonaws.com 'mkdir -p ~/.ssh'' returned non-zero exit status 255, sleeping 30 ... ... subprocess.CalledProcessError: Command 'ssh -t -o StrictHostKeyChecking=no -i /home/johndoe/.ssh/myspark.pem root@ec2-54-90-57-174.compute-1.amazonaws.com '/root/spark/bin/stop-all.sh'' returned non-zero exit status 127

where root@ec2-54-90-57-174.compute-1.amazonaws.com is the user and master instance. I tried -u ec2-user and incremented -w to 600, but got the same error.

I can see the master and slave instances in us-east-1 when I log in to the AWS console, and I can actually ssh into the Master instance from the local ip-some-instance shell.

I understand that the spark-ec2 script takes care of defining the Master / Slave security groups (which ports are listening, etc.), and I do not need to configure these settings. This indicates that the master and subordinates are listening to message 22 ( Port:22, Protocol:tcp, Source:0.0.0.0/0 in ampcamp3-slaves / master groups).

I am in trouble here, and would be grateful for any pointers before spending all my R&D funds on EC2 instances .... Thanks.

+6

ssh amazon-ec2 apache-spark

user2105469 Nov 09 '14 at 20:31

source share

1 answer

Josh rosen · Answer 1 · 2014-11-09T23:08:57+0000

This is most likely due to the fact that SSH takes a long time to start the instances, which results in a 120-second timeout before the machines can log in. You should be able to run

 ./spark-ec2 -i ~/.ssh/myspark.pem -r us-east-1 -k myspark --copy launch --resume try1

(with the --resume flag) to continue when the situation has stopped working without restarting new instances. This problem will be fixed in Spark 1.2.0, where we have a new mechanism that intelligently checks the status of SSH, rather than relying on a fixed timeout. We are also addressing the root causes of SSH launch delays by creating new AMIs.

Spark-ec2 call from EC2 instance: ssh host connection refused

More articles: