Corrective Action: relieve memory pressure issues with Sentry's kafka
Summary
In gitlab-com/gl-infra/production#19030 (closed) we followed the runbook in https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/sentry/sentry.md#offset_out_of_range-broker-offset-out-of-range to unblock the pipes.
This means Kafka has gone out of sync with the consumers. According to the official docs, there are a number of reasons for this, but we've only previously run into this due to memory pressure.
It's possible this was caused by a spike in events coming in, resulting in memory pressure. Considering the Kafka containers are consistently at their memory limit this is probably not all that surprising. Creating this corrective action to look into potential increases in memory limits.
Related Incident(s)
Originating issue(s): gitlab-com/gl-infra/production#19030 (closed)
Desired Outcome/Acceptance Criteria
Reduced memory pressure on Kafka, leading to less likelihood of it becoming out of sync with consumers.
Associated Services
Sentry
Corrective Action Issue Checklist
-
Link the incident(s) this corrective action arose from -
Give context for what problem this corrective action is trying to prevent re-occurring -
Assign a severity label (this is the highest sev of related incidents, defaults to 'severity::4') -
Assign a priority (this will default to 'Reliability::P4' but should match the severity of the related incident) -
Assign a service label -
Assign a team label
