move from 5s to 30s initialDelaySeconds on probes to prevent crash on deployment
InitialDelaySeconds on probes seems to trigger some crashes on deployments
https://github.com/NVIDIA/gpu-monitoring-tools/issues/161
Changing values to 30s should allow everyone to deploy. We can also move it the values.yaml
if you want to keep it to 5 by default.
If thing it's better than playing with scrapping interval just to get it working
# https://github.com/NVIDIA/gpu-monitoring-tools/issues/161#issuecomment-797738193
# need to set it low so that readiness/liveness probes succeede
extraEnv:
- name: "DCGM_EXPORTER_INTERVAL"
value: "5000"
I've not tested if 30s is enough to prevent crash loops.
Can someone test and comment please ?
Edited by Maxime Bourgeois