feat(prometheus): enable startupProbe
Background
In gitlab-com/gl-infra/production#5466 (closed) we see Prometheus being constantly killed because it didn't pass the readiness check. During start time Prometheus tries to read a WAL file that sometimes can get large, in our situation we were not giving Prometheus enough time to read the WAL file before it was getting restarted again.
Solution
Use startupProbe
which is a probe that runs before the
readinessProbe
and has a higher timeout and threshold, so if there is a
large WAL file Prometheus has an hour to read this. You can read more about startupProbe at https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes
This was tested in other environments before in !511 (merged) and !513 (merged)
reference https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/14359