Bumping CACHE_COMMONMARK_VERSION is risky
The received wisdom is that bumping Gitlab::MarkdownCache::CACHE_COMMONMARK_VERSION is risky:
Changing either the
Gitlab::MarkdownCache::CACHE_COMMONMARK_VERSIONor the application setting causes all cached Markdown fields to be re-rendered. For large installations, this puts heavy strain on the database, as every row with cached Markdown needs to be updated. So, avoid changingGitlab::MarkdownCache::CACHE_COMMONMARK_VERSIONif the change to the renderer output is a new feature or a minor bug fix. It should only be bumped in extreme circumstances. For more information, see issue 330313.
That issue reads:
During the incident gitlab-com/gl-infra/production#4481 (comment 568197343), we realized that changing markdown version numbers puts a lot of additional strain on the GitLab.com database.
This is because every row with cached markdown on it needs to be updated.
Unfortunately some of these tables, such as
namespaceare critical to the performance of GitLab.com. The updates lead to additional dead tuples, which slows things down.
It seemed looking into a long-term fix was essentially punted on when the guidance "this line should not be changed" was introduced: #330313 (comment 588997600)
@digitalmoksha (Markdown DRI at that time) wrote:
I would just like to remind everyone who may stumble on this issue that there are times when this line must be changed.
- fixing a security issue or critical bug. If we fix an XSS problem or some other security issue that effects how our cached html gets rendered, we have to bump the cache version.
- if rendering an older version of the cached html will in some way break the UI for the customer, then either the cache needs to get bumped or the UI needs to be tolerant of the older format.
This reveals a bit of a problem: we need to be able to bump it in the case there's XSS stored in rendered fields (or some other critical rendering bug), but at the same time, we expect we might cause an incident if doing so. This creates a potentially dangerous counterincentive to not bump.
Per the second item, it also complicates feature development:
However, as has been flagged, the CSS changes as-is aren't backwards compatible with old, cached renders: the anchor icon cannot be seen at all
i.e. we have to anticipate indefinitely stale renders and (at the very least) write CSS to account for every version of cached HTML we may ever produce!