Docs - Increasing EKS node group pool sizes doesn't work due to autoscale handling
We recently expanded our supporting pool node group because we've added a few extra things and needed the capacity. We changed supporting_node_pool_count from 2 to 3 in. However, the terraform apply then failed:
│ Error: error updating EKS Node Group (<REDACTED>:gitlab_supporting_pool_20220721011642956500000011) config: InvalidParameterException: Minimum capacity 3 can't be greater than desired size 2
This is because desired_size is in ignore_changes in the lifecycle stanza for the node pool, so terraform reads the value, but ignores it and doesn't update it (to the 'min size'), leading to the conflict when the API call is made.
The ignore was added with Cluster auto-scaling capability, and it definitely seems to be necessary, because https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_node_group#ignoring-changes-to-desired-size suggests it, and https://docs.aws.amazon.com/cli/latest/reference/eks/update-nodegroup-config.html says
When Cluster Autoscaler is used, the desiredSize parameter is altered by Cluster Autoscaler (but can be out-of-date for short periods of time)
So in the cluster autoscaling case the value is being updated live by pixies in the system and we not only shouldn't care about changes, but also shouldn't poke new values into it.
I don't see a clean way out of this, because lifecycle configs are notoriously unable to have variable/optional elements; we'd need to create a node pool config for each case, which would be horribly duplicated. Maybe we can only add a documentation note about node pool size changes requiring manual effort; I moved past it by commenting out the lifecycle stanza and applying with -target as a separate execution, but the other option is to alter the desired_size manually with AWS CLI or GUI.