GitLab-managed app version-related edge cases
Currently, GitLab-managed apps are versioned by their corresponding Helm charts. These are generally third-party charts, with various customization applied via a values.yaml that lives in the GitLab codebase.
Up until recently, we did not support app upgrades nor uninstallation. Now, however, we support uninstallation for each app, and are expanding upgrade functionality.
Over time, we have noticed architectural omissions that can lead to problems with a growing userbase, especially as the number of concurrent versions (also known as fragmentation) increases.
Sub-versions in values.yaml can change without chart version changing
Some charts, like ingress and prometheus, allow users to override versions of sub-components of the chart. This fragments our userbase in an unexpected way that can cause subtle and hard-to-fix problems in the long term.
When this can be a problem:
- user installs chart version
Nwith sub-component versionX -
Xis changed toX+1, thenX+2etc in our code - many months later, user has a problem inherent to version
X. In the meantime, the team responsible for the feature could have fully turned over, and nobody is left that knows thatXwas ever present.
Another example:
- user installs chart version
Nwith sub-component versionX -
Xis bumped toX+1with an incompatible storage version. This is justified by the argument that only new users will receive it - later, we bump
NtoN+1and present users with an update. We only perform QA with sub-componentX+1because that's the only thing present in the code - the users who had version
Xand choose to update will suffer in one way or another
Upgrade code typically only accounts for current and next known version
When this can be a problem:
- user installs chart version
N - gitlab is upgraded several times, chart version is now
N+10 - special steps are required to get from
NtoN+1 - upgrade logic assumes user will go from
N+9toN+10
Uninstallation code does not account for old versions
When this can be a problem:
- user installs version
N - gitlab is upgraded several times, chart version is now
N+10, and uinstallation logic has changed - uninstallation button is still present, but the uninstallation logic no longer knows how to handle version
N
Sidekiq and Web can have different information
During upgrades, canary releases etc, there can be different versions on Web and Sidekiq, causing odd behavior. See #14706 (closed) for a good example.