Fix: Further increase memory and cpu requests and limits

What

Increasing the following:

  • Increase auth pod memory and CPU request values
  • Increase auth pod memory limit value
  • Increase probeTimeoutSeconds value

Why

Previous related MR

Memory Request/Limit and CPU Request Increase

Unfortunately, we are still seeing pod crashes even with the previous update for increasing memory and CPU requests.

Teleport support suggested increasing the values further to 8Gi, but it might be best to raise it much higher for a few days just to monitor usage and refine from there.

The following data also mirrors what we see in Grafana

image.png

image.png

Liveliness Probe Timeout Second Increase

From @jcamgl's comment here, we are seeing liveliness probe failures at the same time as when the session uploads fail. Not a permanent solution, but we want to monitor to see if increasing liveliness probe timeout seconds would help prevent the containers from crashing.

gitlab-com/gl-infra/production-engineering#27857

Edited by Joey Wu

Merge request reports

Loading