Fix: Further increase memory and cpu requests and limits
What
Increasing the following:
- Increase auth pod memory and CPU request values
- Increase auth pod memory limit value
Increase probeTimeoutSeconds value
Why
Previous related MR
Memory Request/Limit and CPU Request Increase
Unfortunately, we are still seeing pod crashes even with the previous update for increasing memory and CPU requests.
Teleport support suggested increasing the values further to 8Gi, but it might be best to raise it much higher for a few days just to monitor usage and refine from there.
The following data also mirrors what we see in Grafana
Liveliness Probe Timeout Second Increase
From @jcamgl's comment , we are seeing liveliness probe failures at the same time as when the session uploads fail. Not a permanent solution, but we want to monitor to see if increasing liveliness probe timeout seconds would help prevent the containers from crashing.here
Links to relevant issues
Edited by Joey Wu

