Connect to an Ec2 ssh error instance (spark cluster)

I have a spark (1.3.1) cluster on ec2 (region: us-east). I have not had any problems with this since the last two months, but since yesterday I cannot ssh one slave (or I can, but it really is a really very long time). My tasks do not fail, they just wait and wait, because they are trying to connect to one slave, and the slave is not responding.

I tried to create a new spark with spark-ec2, but I got this error:

Warning: SSH connection error. (This could be temporary.)
Host: 54.90.24.42
SSH return code: 255
SSH output: ssh: connect to host 54.90.24.42 port 22: Connection refused

.

Warning: SSH connection error. (This could be temporary.)
Host: XX.XXX.XXX.XX
SSH return code: 255
SSH output: ssh: connect to host XX.XXX.XXX.XX port 22: Connection refused

As I write a colleague a report a similar problem in another cluster:

org.apache.spark.shuffle.FetchFailedException: Failed to connect to ip-10-231-187-233.ec2.internal/10.231.187.233:54801

All of these problems seem to be related.

Does anyone have an idea what this might be?

+4

Source: https://habr.com/ru/post/1611403/


All Articles