Increase nodepool sizes for generic regional nodepools

Following investigation in gitlab-com/gl-infra/observability/team#4206 (closed), we've identified that our generic regional nodepools are running at high capacity and need to be increased.

Note: This issue was not identified through regular capacity planning because the kube_pool_max_nodes saturation point has been missing data for the past several months due to conflicting metrics in the push gateway.

Background

The kube_pool_max_nodes saturation monitoring revealed that our nodepools are pretty full:

image

src

However, we're currently using approximately 90% of the available IP addresses for these nodes:

image

src

Capacity planning report

Task

We need to increase headroom on the nodepool utilization, keeping in mind IP address availability.

Configuration

The nodepool sizes can be increased here: https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/blob/8a7c8df6649da0bc105947f8d5aa47ad0c06fa65/environments/gprd/variables.tf#L384

Details

Questions to Address

  1. What is our safe threshold for IP address usage? (Current thresholds in kube_node_ips.libsonnet are soft: 80%, hard: 90%)
  2. Do we need to expand IP address ranges before increasing nodepool capacity?
  3. What is the recommended approach for scaling the nodepools while maintaining IP address headroom?
Edited by Bob Van Landuyt