Centrally enforcing upgrade windows for all runway services

Problem

tldr; With more users onboarding to Runway, we run the risk of having to support a wider range of Runway versions which is a growing maintenance toil for the Runway team.

We've had few previous discussions about service project upgrades:

Additionally, manual service project upgrades are time consuming and difficult to debug: #403 (comment 2215367631).

The current model of pull-based upgrades is a risk that needs mitigated by moving towards an automated push-based upgrade process for Runway. We could likely benefit from learning how Dedicated handles similar scenario of upgrading tenants w/ Switchboard, even if initial iteration of our tooling would be a much simpler wrapper around renovate-bot instead.

Proposal

Use the Runway provisioner project to insert a semantic version checking job into the .gitlab-ci.yml of deployment projects like https://gitlab.com/gitlab-com/gl-infra/platform/runway/provisioner/-/merge_requests/457. This lets us push out minimum version requirements for various groups of runway services.

Screenshot_2024-11-28_at_4.42.22_PM

Key improvements

  1. Check runway version used using CI job in deployment pipeline. (https://gitlab.com/gitlab-com/gl-infra/platform/runway/provisioner/-/merge_requests/477)
  2. Provide paved onboarding onto Renovate for new Runway users (#69 (comment 2230111723))
  3. Control upgrade cadence through renovate-runway.json on runwayctl (runwayctl!734 (merged))

Considerations

How aggressive should we be with enforcement of updates? Some suggestions from https://gitlab.com/gitlab-com/gl-infra/platform/runway/provisioner/-/merge_requests/457#note_2221272389:

  1. See warning and ignore
  2. See warning and manually upgrade
  3. See warning and opt-in to auto-upgrades
  4. Do not see warning and we upgrade ()

We could adjust the degree of interruption to users by setting allow_failure for the semver checking job. In general, we have the following archetypes of jobs:

  • GA services receiving production traffic
  • Beta/Experimental services
  • Test services belonging to Runway platform devs
  • Stale services that are untouched for months

Closing summary

We can now control upgrade cadence for renovate-ci-managed Runway service projects. These projects would need to extend our config preset.

Onboarding is made easier with Renovate-ci. The instructions to onboard is updated in the onboarding documentation under Keeping up with Runway releases using Renovate. For example, runway docs was automatically onboarded with a merge request to create a renovate.json file

We also have a way to centrally "push" minimum recommended versions through the provisioner. The schedule for bumping the minimum recommended version can be determined separately.

Edited by Sylvester Chin