Proactively clear the highlights cache when a merge request version is removed
The following discussion from !26555 (merged) should be addressed:
-
@nick.thomas started a discussion: (+1 comment)
@DouweM would you be happy leaving this part as TODO?
Gitlab::Diff::HighlightCache
is a bit troublesome, and the values will expire after 7 days anyway. Removing theMergeRequestDiffFile
objects from the database means they're not accessible.I think I'd like to create a follow-up issue that:
- Makes the IDs trivially calculable or uses Redis sets so we can find all entries for a given
MergeRequestDiffFile
cheaply - Makes
MergeRequestDiff#destroy
clean up after itself by emptying the cache for us
- Makes the IDs trivially calculable or uses Redis sets so we can find all entries for a given
Gitlab::Diff::HighlightCache
stores highlighed versions of a MergeRequestDiff
in redis. These have an expiry of 7 days, and aren't automatically removed when we destroy an existing MergeRequestDiff
. This happens routinely, so we should remove the redis cached versions proactively too - it will help to keep the Redis cache cluster smaller.
A single diff may have multiple redis versions, since there are multiple possibilities for diff_options
. Currently, we can't efficiently get a list of all the keys in redis that represent the highlighting for a single MergeRequestDiff
, so we need to do some work here too.
Simplest would be to ensure we can use a single, well-known key for all versions of the cache, or to ensure we only need a single cache. If that's not possible, we could maintain a separate Redis set on such a key that contains a list of all the keys relating to the diff. This allows us to efficiently look up and remove them without using the KEYS
redis command (which is super-expensive).
One instance where we remove MergeRequestDiff
rows is when cleaning up the repository following a run of the BFG Repo Cleaner: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/26555#note_165151147 - in this instance, we may leave (inaccessible) sensitive content in Redis until expiry is complete. Although the user can run gitlab-rake cache:clear:redis
to remove it, implementing this means we'll be able to clear the content immediately, which is better.