Skip to content

move from 5s to 30s initialDelaySeconds on probes to prevent crash on deployment

InitialDelaySeconds on probes seems to trigger some crashes on deployments

https://github.com/NVIDIA/gpu-monitoring-tools/issues/161

Changing values to 30s should allow everyone to deploy. We can also move it the values.yaml if you want to keep it to 5 by default.

If thing it's better than playing with scrapping interval just to get it working

# https://github.com/NVIDIA/gpu-monitoring-tools/issues/161#issuecomment-797738193
# need to set it low so that readiness/liveness probes succeede
extraEnv:
  - name: "DCGM_EXPORTER_INTERVAL"
    value: "5000"

I've not tested if 30s is enough to prevent crash loops.

Can someone test and comment please ?

Edited by Maxime Bourgeois

Merge request reports