Corrective action: LoggingVisibilityDiminished
Summary
We're getting paged re: resource concerns for ES (gitlab-logs-prod) and while it did resolve on its own after a while, we probably should add a little extra capacity, at least for the moment in order to determine what the tipping point is.
Previously 2 nodes were added to the cluster, which greatly improved things; see previous issue.
Related Incident(s)
Originating issue(s): production#8052 (closed)
Previous issue: production#8029 (closed)
Desired Outcome/Acceptance Criteria
We no longer get paged for LoggingVisibilityDiminished alerts that resolve themselves after a few minutes.
Associated Services
Corrective Action Issue Checklist
-
Link the incident(s) this corrective action arose out of -
Give context for what problem this corrective action is trying to prevent from re-occurring -
Assign a severity label (this is the highest sev of related incidents, defaults to 'severity::4') -
Assign a priority (this will default to 'Reliability::P4')