Corrective action: LoggingVisibilityDiminished

Summary

We're getting paged re: resource concerns for ES (gitlab-logs-prod) and while it did resolve on its own after a while, we probably should add a little extra capacity, at least for the moment in order to determine what the tipping point is.

Previously 2 nodes were added to the cluster, which greatly improved things; see previous issue.

Related Incident(s)

Originating issue(s): production#8052 (closed)

Previous issue: production#8029 (closed)

Desired Outcome/Acceptance Criteria

We no longer get paged for LoggingVisibilityDiminished alerts that resolve themselves after a few minutes.

Associated Services

Corrective Action Issue Checklist

  • Link the incident(s) this corrective action arose out of
  • Give context for what problem this corrective action is trying to prevent from re-occurring
  • Assign a severity label (this is the highest sev of related incidents, defaults to 'severity::4')
  • Assign a priority (this will default to 'Reliability::P4')