Skip to content

Draft: Further fix for AWS EKS Launch Template AMI handling

Grant Young requested to merge gy-ami-handling-fix-lc into main

What does this MR do?

Follow up from !995 (merged) !1057 (merged) !1066 (merged)

There are a multitude of ways in AWS EKS you can set what AMI ID (OS) nodes use. With our switch to IMDSv2 and IRSA we switched to use the Launch Template approach instead of defining them directly in AWS EKS Manage Node Groups.

There are pros and cons to both approaches but ideally we really want to avoid trying to supporting both as that dual approach does have some cons:

  • Switching between them requires full Node Pool rebuilds
  • The ami_type param is needed for the Manage Node Group approach and specifically, Terraform does not do drift detection on this param. That means we'll need to always ensure it's set correctly when switching between the two
  • Both have different approaches to selecting the image ID and both require additional work to perform upgrades.
  • Attempting to support both introduced several permutations and increased brittleness in code

As Launch Templates are being used anyways, that they are required to handle AMIs when using custom build images (or in the future ARM images) and that they offer a much more graceful rollout on AMI upgrade it looks to still be the best approach on paper moving forward. Either way there's an opinionated choice to be made her as both approaches have pros and cons as mentioned.

This MR fixes an additional mistake with the Launch Template approach in that it must always have an AMI ID set and an overhang was meaning this wasn't being set in some situations. By default though this requires polling AWS for the version and as a result it will mean that the Node Pools upgrade if a newer AMI is available and recommended by AWS on subsequent Terraform runs. However it is still possible to pin AMI versions by specifying an ID directly via eks_ami_id and this is now going to be the approach taken in 3.0.0 onwards.

Related issues

Relates #366 (closed)

Author's checklist

When ready for review, the Author applies the workflowready for review label and mention @gl-quality/get-maintainers:

  • Merge request:
    • Corresponding Issue raised and reviewed by the GET maintainers team.
    • Merge Request Title and Description are up-to-date, accurate, and descriptive
    • MR targeting the appropriate branch
    • MR has a green pipeline
    • MR has no new security alerts in the widget from the Secret Detection and IaC Scan (SAST) jobs.
  • Code:
    • Check the area changed works as expected. Consider testing it in different environment sizes (1k,3k,10k,etc.).
    • Documentation created/updated in the same MR.
    • If this MR adds an optional configuration - check that all permutations continue to work.
    • For Terraform changes: set up a previous version environment, then run a terraform plan with your new changes and ensure nothing will be destroyed. If anything will be destroyed and this can't be avoided please add a comment to the current MR.
  • Create any follow-up issue(s) to support the new feature across other supported cloud providers or advanced configurations. Create 1 issue for each provider/configuration. Contact the Quality Enablement team if unsure.

Merge request reports