Skip to content

feat(prometheus): enable startup & readiness Probe gstg

Steve Xuereb requested to merge gstg/startUpProbes into master

Background

Sometimes Prometheus has a large WAL to replay and can take a long time to process. Since the liveness and readiness probes have a short threshold by default it ends up killing the prometheus pod constantly because it never has time to replay the WAL.

Solution

Introduce startupProbe which runs before every other probe, and wait for that to successed before moving onto the readiness probe. The goal here is to give enough time to Prometheus to replay a large WAL file.

To get more information about startupProbe and readinessProbe run the following commands:

kubectl explain pod.spec.containers.startupProbe
kubectl explain pod.spec.containers.readinessProbe

reference https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/14359

Edited by Steve Xuereb

Merge request reports