Skip to content

feat(prometheus): enable startupProbe pre, ops, org-ci

Steve Xuereb requested to merge feat/enable-startup-probes-expect-gprd into master

Background

Sometimes Prometheus has a large WAL to replay and can take a long time to process. Since the liveness and readiness probes have a short threshold by default it ends up killing the prometheus pod constantly because it never has time to replay the WAL.

Solution

Introduce startupProbe which runs before every other probe, and wait for that to successed before moving onto the readiness probe. The goal here is to give enough time to Prometheus to replay a large WAL file.

To get more information about startupProbe and readinessProbe run the following commands:

kubectl explain pod.spec.containers.startupProbe
kubectl explain pod.spec.containers.readinessProbe

reference https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/14359

Merge request reports