CI trace chunks using lots of memory in redis-persistent
On gitlab.com we saw a recent growth in memory utilization: https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/13044.
The analysis there found that a large portion of memory is being consumed by gitlab:ci:trace:$ID:chunks. We suspect but have not confirmed that this is what is responsible for the recent growth.
The top key patterns shows 7 GB consumed by trace chunks (source):
gitlab:ci:trace:$ID:chunks 7156296774
session:gitlab:2::$ID 2717414786
session:user:gitlab:$PATTERN 646131774
projects/$ID/pushes_since_gc 502382174
projects/$ID/fetches_since_gc 212051119
etag:$PATH 64249403
...
We need to figure out:
- Are CI trace chunks getting stuck in redis?
- Are we at risk of going OOM, and are there short term mitigations we can apply to avoid disaster?
And as follow up items:
- Do we want to consider isolating this to a separate redis instance? (i.e. https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/12821)