Skip to content

Resolve "[Feature flag] Rollout of `remove_caching_diff_batches`"

What does this MR do and why?

Analysis by @reprazent

@marc_shaw I recently did an analysis of the entire keyspace in gitlab-com/gl-infra/scalability#1882 (comment 1082694370)

I noticed that we're now using more memory for this key then we did before #367145 (closed). We were using 10% of the total cached values before, now we're using 13% (4.5 GB) at te time of the snapshot. The increase could be because there is now more space available on the cache, and we're not at maxmemory, so we're evicting less of those keys.

Looking at the implementation, it makes sense: we've enabled etag caching, but for the first request (when the client doesn't have cache) we still store the generated results in redis using the render_cached method (!93953 (diffs)). This is the feature flag that would actually stop storing that information in the cache.

When looking at the cache hits for the #diffs_batch endpoint in question in Thanos, this shows a maximum hit-ratio of about 1%. Which I don't think would warrant keeping this data in the cache.

Looking at the durations, I don't immediately see a difference in request latency from enabling the feature flag, perhaps a slight elevation for the slowest ones.

image

https://log.gprd.gitlab.net/goto/2266de80-2d1a-11ed-8656-f5f2137823ba

I think that's acceptable, since about 1/3 of the requests were getting a 304:

image

https://log.gprd.gitlab.net/goto/c9eb2b00-2d1c-11ed-b86b-d963a1a6788e

If we combine that information with the ~1% cache hit-ratio in you mentioned, I don't think the redis-cache is worth keeping. What do you think?

Related to #370933 (closed)

Edited by Marc Shaw

Merge request reports