Skip to content

Add retention size to prometheus test server

What does this MR do?

Add retention size to prometheus test server

Our prometheus server keeps reaching its volume limit which causes
prometheus-server pods to be OOMKilled, then get Evicted in a a
neverending loop until we reset its PVC and delete its PV.

A year ago we attempted to fix this by adding
prometheus.server.retention="4d" to override the default of 15d. But
this was not enough.

I suspect that even though we've set the retention time, we also have to
set the retention size, as I believe this two might work together.

See the docs: https://prometheus.io/docs/prometheus/2.40/storage/
(we're on 2.38 but these docs are not available anymore)

It says:

--storage.tsdb.retention.size: The maximum number of bytes of storage
blocks to retain. The oldest data will be removed first. Defaults to 0
or disabled.

If this "or disabled" means that the cleanup of the retention is
disabled, then we definitely need to set it to some value.

Additionally, this change also refactors the prometheus configuration to
cleanup our helm upgrade command in the same fashion that we do for
other configurations.

Related issues

Author checklist

See Definition of done.

For anything in this list which will not be completed, please provide a reason in the MR discussion.

Required

  • Merge Request Title and Description are up to date, accurate, and descriptive
  • MR targeting the appropriate branch
  • MR has a green pipeline on GitLab.com
  • When ready for review, follow the instructions in the "Reviewer Roulette" section of the Danger Bot MR comment, as per the Distribution experimental MR workflow

Expected (please provide an explanation if not completing)

  • Test plan indicating conditions for success has been posted and passes
  • Documentation created/updated
  • Tests added/updated
  • Integration tests added to GitLab QA
  • Equivalent MR/issue for omnibus-gitlab opened
  • Equivalent MR/issue for Gitlab Operator project opened (see Operator documentation on impact of Charts changes)
  • Validate potential values for new configuration settings. Formats such as integer 10, duration 10s, URI scheme://user:passwd@host:port may require quotation or other special handling when rendered in a template and written to a configuration file.
Edited by João Alexandre Cunha

Merge request reports