GitLab-managed app version-related edge cases

Currently, GitLab-managed apps are versioned by their corresponding Helm charts. These are generally third-party charts, with various customization applied via a values.yaml that lives in the GitLab codebase.

Up until recently, we did not support app upgrades nor uninstallation. Now, however, we support uninstallation for each app, and are expanding upgrade functionality.

Over time, we have noticed architectural omissions that can lead to problems with a growing userbase, especially as the number of concurrent versions (also known as fragmentation) increases.

Sub-versions in values.yaml can change without chart version changing

Some charts, like ingress and prometheus, allow users to override versions of sub-components of the chart. This fragments our userbase in an unexpected way that can cause subtle and hard-to-fix problems in the long term.

When this can be a problem:

  1. user installs chart version N with sub-component version X
  2. X is changed to X+1, then X+2 etc in our code
  3. many months later, user has a problem inherent to version X. In the meantime, the team responsible for the feature could have fully turned over, and nobody is left that knows that X was ever present.

Another example:

  1. user installs chart version N with sub-component version X
  2. X is bumped to X+1 with an incompatible storage version. This is justified by the argument that only new users will receive it
  3. later, we bump N to N+1 and present users with an update. We only perform QA with sub-component X+1 because that's the only thing present in the code
  4. the users who had version X and choose to update will suffer in one way or another

Upgrade code typically only accounts for current and next known version

When this can be a problem:

  1. user installs chart version N
  2. gitlab is upgraded several times, chart version is now N+10
  3. special steps are required to get from N to N+1
  4. upgrade logic assumes user will go from N+9 to N+10

Uninstallation code does not account for old versions

When this can be a problem:

  1. user installs version N
  2. gitlab is upgraded several times, chart version is now N+10, and uinstallation logic has changed
  3. uninstallation button is still present, but the uninstallation logic no longer knows how to handle version N

Sidekiq and Web can have different information

During upgrades, canary releases etc, there can be different versions on Web and Sidekiq, causing odd behavior. See #14706 (closed) for a good example.

Edited Nov 11, 2019 by Hordur Freyr Yngvason
Assignee Loading
Time tracking Loading