I work with the CloudFormation template, which calls as many instances as I request, and I want to wait until their initialization is completed (via user data) before the creation / update of the stack is completed.
Expectation
Creating or updating a stack should wait for signals from all newly created instances to ensure that their initialization is complete.
I do not want the creation or updating of the stack to be considered successful if any of the created instances are not initialized.
Reality
CloudFormation only seems to be waiting for signals from instances when the stack is first created. Updating the stack and increasing the number of instances seem to ignore the signaling. The update operation succeeds very quickly, while the instances are still initializing.
Instances created as a result of updating the stack may not be initialized, but the update action is already considered successful.
Question
Using CloudFormation, how can I make reality live up to expectations?
I want the same behavior to apply when creating a stack when the stack is updated.
Related questions
I found only the following question that matches my problem: UpdatePolicy in Autoscaling group does not work correctly for updating CloudSormation AWS
It was open for a year and did not receive a response.
I am creating another question, as I have additional information to add, and I am not sure that this data will correspond to the data of the author in this question.
reproducing
To demonstrate the problem, I created a template based on an example under the Auto Scaling Group heading on this AWS documentation page that includes an alarm.
The created template was adapted as follows:
- It uses AMI Ubuntu (in the area of
ap-northeast-1 ). The cfn-signal command was loaded and called if necessary, taking into account this change. - The new parameter determines how many instances are started in the auto-scaling group.
- Before the alarm, a sleep time of 2 minutes was added to simulate the time spent on initialization.
Here is the template saved in template.yml :
Parameters: DesiredCapacity: Type: Number Description: How many instances would you like in the Auto Scaling Group? Resources: AutoScalingGroup: Type: AWS::AutoScaling::AutoScalingGroup Properties: AvailabilityZones: !GetAZs '' LaunchConfigurationName: !Ref LaunchConfig MinSize: !Ref DesiredCapacity MaxSize: !Ref DesiredCapacity CreationPolicy: ResourceSignal: Count: !Ref DesiredCapacity Timeout: PT5M UpdatePolicy: AutoScalingScheduledAction: IgnoreUnmodifiedGroupSizeProperties: true AutoScalingRollingUpdate: MinInstancesInService: 1 MaxBatchSize: 2 PauseTime: PT5M WaitOnResourceSignals: true LaunchConfig: Type: AWS::AutoScaling::LaunchConfiguration Properties: ImageId: ami-b7d829d6 InstanceType: t2.micro UserData: 'Fn::Base64': !Sub | #!/bin/bash -xe sleep 120 apt-get -y install python-setuptools TMP=`mktemp -d` curl https://s3.amazonaws.com/cloudformation-examples/aws-cfn-bootstrap-latest.tar.gz | \ tar xz -C $TMP --strip-components 1 easy_install $TMP /usr/local/bin/cfn-signal -e $? \ --stack ${AWS::StackName} \ --resource AutoScalingGroup \ --region ${AWS::Region}
Now I create a single instance stack using:
$ aws cloudformation create-stack \ --region=ap-northeast-1 \ --stack-name=asg-test \ --template-body=file:
After waiting a few minutes to complete the creation, view a few events on the key stack:
$ aws cloudformation describe-stack-events \
... { "Timestamp": "2017-02-03T05:36:45.445Z", ... "LogicalResourceId": "AutoScalingGroup", ... "ResourceStatus": "CREATE_COMPLETE", ... }, { "Timestamp": "2017-02-03T05:36:42.487Z", ... "LogicalResourceId": "AutoScalingGroup", ... "ResourceStatusReason": "Received SUCCESS signal with UniqueId ...", "ResourceStatus": "CREATE_IN_PROGRESS" }, { "Timestamp": "2017-02-03T05:33:33.274Z", ... "LogicalResourceId": "AutoScalingGroup", ... "ResourceStatusReason": "Resource creation Initiated", "ResourceStatus": "CREATE_IN_PROGRESS", ... } ...
You can see that the auto-scaling group started its work at 05:33:33. At 05:36:42 (3 minutes after initiation) he received a signal of success. This allowed the auto-scaling group to achieve their success status only a few seconds after, at 05:36:45.
It's awesome - it works like a charm.
Now try increasing the number of instances in this auto-scaling group to 2 by updating the stack:
$ aws cloudformation update-stack \ --region=ap-northeast-1 \ --stack-name=asg-test \ --template-body=file:
After waiting for a much shorter time to complete the update, consider some of the new stack events:
$ aws cloudformation describe-stack-events \
{ "ResourceStatus": "UPDATE_COMPLETE", ... "ResourceType": "AWS::CloudFormation::Stack", ... "Timestamp": "2017-02-03T05:45:47.063Z" }, ... { "ResourceStatus": "UPDATE_COMPLETE", ... "LogicalResourceId": "AutoScalingGroup", "Timestamp": "2017-02-03T05:45:43.047Z" }, { "ResourceStatus": "UPDATE_IN_PROGRESS", ..., "LogicalResourceId": "AutoScalingGroup", "Timestamp": "2017-02-03T05:44:20.845Z" }, { "ResourceStatus": "UPDATE_IN_PROGRESS", ... "ResourceType": "AWS::CloudFormation::Stack", ... "Timestamp": "2017-02-03T05:44:15.671Z", "ResourceStatusReason": "User Initiated" }, ....
Now you can see that while the auto-scaling group started updating at 05:44:20, it ended at 05:45:43 - this is less than a minute and a half before completion, which should not be possible, given a sleep time of 120 seconds in user data.
The stack update then ends without the auto-scaling group receiving any signals.
A new instance does exist.
In my real case of using SSHed in one of these new instances, I found that it was still in the initialization process even after the stack update was completed.
What i tried
I read and re-read the documentation surrounding CreationPolicy and UpdatePolicy , but could not determine what I was missing.
Looking at the update policy above, I donβt understand what it actually does. Why is WaitOnResourceSignals true, but it does not wait? Is it used for other purposes?
Or are these new instances not covered by the rolling update policy? If they do not belong there, I would expect them to fall under the creation policy, but that also does not seem to be.
So I really don't know what else to try.
I have an eloquent feeling that it functions as designed / expected, but if that is what is the point of this WaitOnResourceSignals property and how can I satisfy the expectation set above?