Draft: Add means to bump Markdown cache gradually

What does this MR do and why?

Closes Bumping CACHE_COMMONMARK_VERSION is risky (#597379) by adding a vetted and repeatable method to bump CACHE_COMMONMARK_VERSION, controlled by a pair of version constants (instead of our current one) and an ops FF to manage the rollout — as it is indeed a rollout that happens each time!

  1. Check the FF is at 0%.
  2. Land MR declaring the cache version we're rolling forward to — no behavioural change yet, but the FF is now being checked.
  3. Adjust the FF to control the percentage of cache version checks report the new version, ramping up to 100% gradually while watching database load. This process can take as long as it needs — hours, days, weeks.
  4. Land MR declaring the roll-forward complete; the FF is now being ignored.
  5. Reset the FF to 0%.

Markdown cache updates currently happen in the following circumstances:

  • The cache column is empty for whatever reason (e.g. not yet generated).
  • The source column for the Markdown has changed.
  • The cached_markdown_version column contains a value less than the current declared cache version.

Note that while I use "cache column" and "source column" in the singular, there can be multiple Markdown columns per record, so this may apply to multiple. There is only one cached_markdown_version per record, however, so bumping that forces a freshening of all caches in that record.

This MR updates the behaviour of the cached_markdown_version column check. The "declared cache version" now depends on an FF read (when the "rolling forward" version is declared). It's OK that a single row may be read sometimes at the current and sometimes at the later version; we do not regress the version when the record's cache version is newer than the current one.

We always write the latest cache version regardless of the cache version check result, as the new write is by definition current.

On using a percentage_of_time FF

Please read the extensively-updated-in-this-MR "Banzai pipeline and parsing" docs regarding the FF type selection. It is by design, and does not trip the concerns that led to its being marked as deprecated (Percentage-based Feature Flags should return th... (#425202 - closed), 2023-09-14: Issues and comments not loading cor... (gitlab-com/gl-infra/production#16366 - closed)). We explicitly do not want the flipper to return the same value for multiple calls in the one request.

References

Screenshots or screen recordings

Before After

How to set up and validate locally

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Asherah Connor

Merge request reports

Loading