Limit the number of rules per rule group in Mimir
In !6790 (merged) we initially started deploying recording rules to mimir.
The default number of rules per group allowed is 100, we raised this to unlimited to allow the rules we already had to be deployed. But ideally we'd get back to the default. Since we're breaking up rules per tenant and per service, we should be able to limit the number of recording rules per group to that. Likely we need to break up some more rule groups. Rule groups can remain in the same file.
We'll need to figure out which rule-groups are currently above that limit, break them up, and add validation so we don't have to wait until we deploy the rules to see when a rule-group exceeds the number of elements.
Discussion on the MR
The following discussion from !6790 (merged) should be addressed:-
@nduff started a discussion: (+1 comment) Just copying over what was said in slack for transparency.
After fixing vault, we needed to add a private service connect, since the runners could not route to the internal gateway of mimir directly.This was done via:
- https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/merge_requests/7643
- https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/merge_requests/7645
- https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/merge_requests/7646
Finally with that in place the CI was working.
We THEN ran into the default limits set in Mimir for maximum rulers per group, and maximum rule groups per tenant
"per-user rules per rule group limit (limit: 100 actual: 731)
.I set this to unlimited for now via gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles!4187 (diffs)
Not ideal but it unblocks us and we can revisit the rule group sizing.
Finally we have a success pipeline: https://ops.gitlab.net/gitlab-com/runbooks/-/jobs/12682630
Short of fixing the yaml linting errors I probably introduced this should be fine to merge now.