What

Increasing the following:

Increase auth pod memory and CPU request values
Increase auth pod memory limit value
~~Increase probeTimeoutSeconds value~~

Why

Previous related MR

Memory Request/Limit and CPU Request Increase

Unfortunately, we are still seeing pod crashes even with the previous update for increasing memory and CPU requests.

Teleport support suggested increasing the values further to 8Gi, but it might be best to raise it much higher for a few days just to monitor usage and refine from there.

The following data also mirrors what we see in Grafana

Liveliness Probe Timeout Second Increase

From @jcamgl's comment ~~here~~, we are seeing liveliness probe failures at the same time as when the session uploads fail. Not a permanent solution, but we want to monitor to see if increasing liveliness probe timeout seconds would help prevent the containers from crashing.

Links to relevant issues

gitlab-com/gl-infra/production-engineering#27857

Edited Dec 11, 2025 by Joey Wu

Fix: Further increase memory and cpu requests and limits

What

Why

Memory Request/Limit and CPU Request Increase

Liveliness Probe Timeout Second Increase

Links to relevant issues

Merge request reports