There are two types of health checks for auto-scale groups:
- EC2 Health Check: An EC2 status check is used to determine if the instance is healthy. It works only at the hypervisor level and cannot see the health of the application running on the instance.
- Elastic Load Performance Test (ELB). . This forces the auto-scaling group to delegate a health check to the elastic load balancer, which is able to validate a specific HTTP (S) URL. This means that it can verify that the application is working correctly on the instance.
Given that your system uses an ELB health check, Auto Scaling will trust the results of the ELB health check when determining the state of each EC2 instance. This can be a little dangerous, because if it takes a while to start the instance, a health check may incorrectly mark the instance as unhealthy. This, in turn, will cause automatic scaling to complete the instance and start the replacement.
To avoid this situation, in the Auto-scaling group setting there is a parameter Setting the health guarantee period (in seconds). This indicates how long Auto Scaling should wait until it starts using the ELB health check (in turn, there are settings for how often to check and how many checks are required to mark the instance as "Healthy / Unhealthy").
So, if your application works for 3 minutes, set the grace period for the health check to be at least 180 seconds (3 minutes). The documentation does not indicate whether the countdown starts from the moment when the instance is marked as “Running” or when the status check is completed, so do some temporary tests to avoid any “failure” situations.
In fact , I would recommend setting the grace period of the health check to a significantly higher value (for example, double the required time). This will not affect the operation of your system, as Healthy Instance will start serving traffic as soon as the ELB health check is checked, which is earlier than the period of automatic scaling. Worst of all, a truly unhealthy instance will be discontinued in a few minutes, but this should be a rare occurrence.
source share