feat(prometheus): enable startupProbe pre, ops, org-ci (!513) · Merge requests · GitLab.com / GitLab Infrastructure Team / Kubernetes Workloads / GitLab Helmfiles

Steve Xuereb requested to merge feat/enable-startup-probes-expect-gprd into master Nov 05, 2021

Background

Sometimes Prometheus has a large WAL to replay and can take a long time to process. Since the liveness and readiness probes have a short threshold by default it ends up killing the prometheus pod constantly because it never has time to replay the WAL.

Solution

Introduce startupProbe which runs before every other probe, and wait for that to successed before moving onto the readiness probe. The goal here is to give enough time to Prometheus to replay a large WAL file.

To get more information about startupProbe and readinessProbe run the following commands:

kubectl explain pod.spec.containers.startupProbe
kubectl explain pod.spec.containers.readinessProbe

reference https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/14359

feat(prometheus): enable startupProbe pre, ops, org-ci

Background

Solution

Merge request reports