Increase nodepool sizes for generic regional nodepools

Following investigation in gitlab-com/gl-infra/observability/team#4206 (closed), we've identified that our generic regional nodepools are running at high capacity and need to be increased.

Note: This issue was not identified through regular capacity planning because the kube_pool_max_nodes saturation point has been missing data for the past several months due to conflicting metrics in the push gateway.

Background

The kube_pool_max_nodes saturation monitoring revealed that our nodepools are pretty full:

src

However, we're currently using approximately 90% of the available IP addresses for these nodes:

src

Capacity planning report

Task

We need to increase headroom on the nodepool utilization, keeping in mind IP address availability.

Configuration

The nodepool sizes can be increased here: https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/blob/8a7c8df6649da0bc105947f8d5aa47ad0c06fa65/environments/gprd/variables.tf#L384

Details

Current IP usage is at ~90% capacity
This investigation was triggered by the need for additional capacity related to gitlab-com/gl-infra/observability/team#4205

Questions to Address

What is our safe threshold for IP address usage? (Current thresholds in kube_node_ips.libsonnet are soft: 80%, hard: 90%)
Do we need to expand IP address ranges before increasing nodepool capacity?
What is the recommended approach for scaling the nodepools while maintaining IP address headroom?

Edited Jun 06, 2025 by Bob Van Landuyt