Experiment cleanup process should ensure experiment data is cleared from Redis
Background
During a recent analysis of memory usage in Redis Shared State, we discovered that gitlab-experiment was a large consumer of memory.
By auditing the set of keys, we were able to determine that we have many stale keys from concluded experiments that can be deleted.
After deleting those keys we reclaimed 6 GiB of space, 40% of the used memory:
Problem
Our experiment process and gitlab-experiment
in particular are leaving stale keys behind when we conclude experiments. These waste a lot of Redis memory.
We should ensure they get cleaned up.
Proposal
We could augment the Experiment Cleanup process to also clear out experiment data from Redis once the experiment has concluded.
🚧
Possible Solutions -
1️⃣ Amend chatops cleanup to detect when it isexperiment
feature flag type(I believe in the api here) and also try to run a clear on the key using this method- Pros:
- no extra commands added to cleanup process
- automatically handles this concern(no need to lint or make sure we remember during MR review to do something else)
- maybe the behavior we want for stopping any experiment anyway...in cases other than cleanup(? not sure here, if it isn't we could ensure it only happens on 'delete' action)
- would only target SaaS instances by default(no need to add
.com?
) logic anywhere else.
- Cons:
- too much magic
- surprising behavior if we were to delete a feature flag for an experiment and we expected that re-enabling it would keep old assignments.
- Pros:
-
2️⃣ Using a post deployment migration run the clear command using this method- Pros:
- is very declarative in the code
- Cons:
- an extra step to the MR cleanup process that could be missed/might need to be linted on.
- would need to probably guard the cleanup with a
.com?
check as experiments are only ran on SaaS
- Pros:
An example of clearing cache can be seen in the tests in 1 and 2 with an example implementation looking like this for tier_badge
:
Gitlab::Experiment::Configuration.cache.clear(key: 'tier_badge') # main assignment cache
Gitlab::Experiment::Configuration.cache.clear(key: 'tier_badge_attrs') # any arbitrary attributes that were cached