How to prevent long-term tasks on an elastic beanstalk so as not to interrupt without blocking scaling?

I have several concurrent workers who handle lengthy work tasks on an elastic beanstalk work environment. Source EC2 instances scale based on the length of the queue. My problem is that workers stop during processing, and there is a scaling in action.

My initial approach to prevent this is as follows: Each worker protects the EC2 instance that he is working on after receiving a message from the sqs daemon. When the worker is ready to process the job, he again protects the EC2 instance. This is apparently the recommended approach for this kind of situation: https://aws.amazon.com/about-aws/whats-new/2015/12/protect-instances-from-termination-by-auto-scaling/ And in addition, he worked without taking into account any placement strategies. This led to the fact that all copies were protected, and the large-scale action was canceled most of the time. The sqs daemon does not seem to account for populating instace with as many workers as possible.

My next consideration is to stop the sqs daemon when starting scaling in action, for example using the auto-scale lifecycle hook. But with this approach, terminated secure instances can still be a problem (I don't know if the hook is running on protected instances). In addition, it is not recommended to stop the sqs daemon: Start / stop the sqsd daemon on Elastic Beanstalk to view SQS queue messages

How can I fulfill both requirements (1. do not stop long-term workers and 2. run as many workers as possible on an EC2 instance) with a working level of elastic beanstalk?

+5
source share
1 answer

I am not very familiar with Beanstalk working-level instances, but as far as I know, they get 1 task at a time, right? If so, why do you need to stop the SQS daemon, presumably if scaling occurs because there are no more tasks in the work queue, so this instance should not get a new one. If someone really arrives at that moment, he will not be removed from the SQS queue, and as soon as the visibility timeout expires, he will be detected by another worker node.

The lifecycle hook will not fire until instance protection is disabled because it only starts when the instance is selected for completion.

If you can add a piece of code to your code, stating that "if the task ends and the new one does not come -> then disable the protection of the instance in this instance." Thus, only instances without tasks can be terminated.

Alternatively, you can leave everything permanently protected and periodically run the bash script on the instance using the Cron job, checking if it is currently running the task or not, and if not, turn off instance protection.

0
source

Source: https://habr.com/ru/post/1275079/


All Articles