Skip to content

Use startupProbe for Prometheus pods

Summary

Start using the startupProbe for our Prometheus pods, so start livenessProbes and redinessProbes don't time out when there is a large wal replay happening. We tried increasing the treshhold of failures for liveness/readiness but this is not sufficient and results into trashing of the containers

startupProbe is not available for us in the current prometheus-operator, we are currently blocked by https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/13973

Related Incident(s)

Originating issue(s): production#5466 (closed)

Desired Outcome/Acceptance criteria

Associated Services

Corrective Action Issue Checklist

  • link the incident(s) this corrective action arose out of
  • give context for what problem this corrective action is trying to prevent from re-occurring
  • assign a severity label (this is the highest sev of related incidents, defaults to 'severity::4')
  • assign a priority (this will default to 'priority::4')