Create saturation alerts for node networking space

As seen on &571 (closed) we have limits to how many nodes we are allowed to spin up on a GKE cluster. Utilize this issue to track the creation of saturation alerts that warn us and Page us when we are near and at our limits:

Milestones

  • Determine true maximum node limit - we have a /22 subnet for our regional cluster and /24 subnet for our zonal clusters - are we limited to 254 nodes? or is some IP's reserved for other functionality? Knowing this will drive how the alert is managed.
  • Create a saturation alert for our clusters to alert us when we are running low and out of available IP's for our clusters
  • Create a saturation alert to let us know if a single node (limited to 110 Pods), is running out of room to host Pods
  • Create appropriate runbooks for handling the above alerts
Edited by John Skarbek