Explore increasing instance sizes for Kubernetes
Problem Statement
Currently we are exploring options for increasing the length of time for which we have available IP addresses for our zonal clusters. One of the mentioned short term solutions is to simply replace our existing node pools with larger instance sizes which should reduce the count of overall nodes as a whole.
Utilize this issue to determine the best instance sizes for all node pools and begin a migration plan to do such.
Refer to &571 (closed) for details on where we are with saturation.
Note that the effort in this issue is segregated from &527
Consideration
We already need to rebuild various node pools due to lack of some labels in our metrics. Consider roping in the work necessary to ensure that our replacement instance sizes include changes to the labels as well as part of &687 (closed)
Milestones
-
Evaluate all node pools and identify an appropriate instance size -
Investigation into additional labels on the terraform module complete - #2391 (comment 938302767) -
Upgrade module across all clusters -
Create new replacement node pools -
Force workloads onto new node pools -
Remove old node pools
Results
Now that this work is complete, we've reduced the overall saturation for all zonal clusters by approximately 50%. Fun fact, we are running fewer nodes during weekend off peak times in comparison to our weekend workload!
More refined results are posted in a below comment: #2391 (comment 953192762)