migrate eks cluster to single availability zone

Our cluster nodes are deployed across three availability zones but services are not replicated across each availability zone so we are not achieving any sort of high-availability that multiple zones offer. This is a problem because:

  • EBS storage created for services is tied to an availability zone; when a service is deployed with persistent block storage, it is then required to deploy to the same availability zone
    • This is especially an issue because if an availability zone becomes unavailable and a service cannot transition to a node in another zone, that service will be unable to be restored until the availability zone becomes available again
  • Applications that communicate with each other maybe deployed to different availability zones, resulting in increased latency and financial costs

We can eliminate this issue by regrouping our cluster nodes to a single availability zone and addressing high-availability concerns later. This would require:

  • taking snapshots of persistent volume storage in EBS on non-target availability zones and restoring them to the target availability zone
    • ensure no data is written to volumes during transition
    • post-restoration: persistentVolume definitions in EKS will need to be updated to the new availability zone (and possibly some identifier)
  • creating new EKS node groups that only deploy nodes to the target availability zone, then deleting the old node groups
Edited by Michael Craig