Investigate growth in elasticsearch logging data
We had a recent incident which brought to our attention to the fact we have had a sudden and unexpected growth in our elasticsearch data for https://log.gprd.gitlab.net/app/kibana
We need to investigate what the cause of this is, if it is an intentional change, or possibly the bi-product of something else.
This is likely to cause another saturation incident at the current growth rate.
Current Status
The elastic storage utilisation started to trend down after the backported fix, which downgraded vue-apollo. This was backported into the 16.5.1 release.
The issue appears to have been caused by the upgrade to vue-apollo because of a related fix that automatically restarts subscriptions on error. This in turn lead to a increase in subscribe/unsubscribe logging messages.
Our ILM policies are quite aggressive at Gitlab due to our level of logging, which enabled us to recover the storage space rather quickly.
Details
- Point of contact for this request: @user
- If a call is needed, what is the proposed date and time of the call: Date and Time
- Additional call details (format, type of call): additional details
SRE Support Needed Support Request Details