Upgrade Google COS image to the newest LTS version

For guidance on the overall deprecations, removals and breaking changes workflow, please visit Breaking changes, deprecations, and removing features

Deprecation Summary

We're currently basing on COS 85 LTS. This LTS version has reached end of support at the end of September 2022. We should switch to the next LTS version - at this moment it's COS 105 LTS.

Looking on the documentation there should be no differences in the general concepts and architecture of the image, so the switch should be fairly easy.

Blog Post

Breaking Change

We're updating the operating system used by ephemeral VMs on which the jobs for SaaS Linux runners are executed. Among multiple updates there is one that causes known compatibility problem: update of Docker Engine from 19.03.15 to a way newer version 23.0.5.

Affected runners

SaaS Linux runners with the following tags are using the old ephemeral VM OS version and will be affected by the update:

  • saas-linux-small-amd64
  • saas-linux-medium-amd64
  • saas-linux-large-amd64
  • saas-linux-xlarge-amd64
  • saas-linux-2xlarge-amd64

SaaS Linux runners with the following tags are using the already updated OS version and therefore will not be affected by the update:

  • saas-linux-medium-amd64-gpu-standard`

SaaS MacOS and SaaS Windows runners are not affected by this change at all.

How to fix it?

Docker-in-Docker usage

When using Docker-in-Docker based jobs, host's Docker Engine in version 20.10 or newer causes problems when docker:dind container in older versions are used. When trying to start docker:dind container in version 19.03.15 on the host that uses Docker Engine 20.10 or newer, we get an error:

cgroups: cgroup mountpoint does not exist: unknown

The solution: update your jobs to use docker:dind in version 20.10 or newer, for example the most recent 20.10.x version:

job:
  services:
  - docker:20.10.24-dind
  image: docker:20.10.24
  script:
  - ...

As in all cases, we highly encourage to test and use the newest possible version and explicitly point it in the job definition. With that your jobs will not start failing randomly when updates of the images will be published.

Breaking change: once our update will be in place, all jobs that are using docker:dind images older than 20.10.0 or newer will start failing with the DinD Service not starting due to the cgroups error.

Kaniko usage

Another popular class of jobs that may be affected by our planned change are jobs using Kaniko to build container images.

When Kaniko version older than 1.11.0 is used, it is unable to detect the container runtime properly and fails with an error:

kaniko should only be run inside of a container, run with the --force flag if you are sure you want to continue

We've found that version 1.11.0 introduces changes in container runtime detection that works properly with Docker Engine 23.0.5 that we will update to on the ephemeral VMs.

The solution: update your jobs to use gcr.io/kaniko-project/executor in version 1.11.0 or newer, for example:

job:
  image: gcr.io/kaniko-project/executor:v1.11.0
  script:
  - ...

As in all cases, we highly encourage to test and use the newest possible version and explicitly point it in the job definition. With that your jobs will not start failing randomly when updates of the images will be published.

Breaking change: once our update will be in place, all jobs that are using gcr.io/kaniko-project/executor images older than 1.11.0 or newer will start failing with the container detection error.

Affected Topology

Affected Tier

  • Free
  • Premium
  • Ultimate

Checklists

Labels

  • This issue is labeled deprecation, and with the relevant ~devops::, ~group::, and ~Category: labels.
  • This issue is labeled breaking change if the removal of the deprecated item will be a breaking change.

Timeline

Please add links to the relevant merge requests.

  • As soon as possible, but no later than the third milestone preceding the major release (for example, given the following release schedule: 14.8, 14.9, 14.10, 15.014.8 is the third milestone preceding the major release):
  • On or before the major milestone: A removal entry has been created so the removal will appear on the removals by milestones page and be announced in the release post.
  • On the major milestone:

Mentions

  • Your stage's stable counterparts have been @mentioned on this issue. @jfarmiloe @timtams @cfoster3
  • Your GPM has been @mentioned so that they are aware of planned deprecations.

Deprecation Milestone

%16.8

Planned Removal Milestone

%17.0

TODO Checklist

Outdated

Links

Edited by Gabriel Engel
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information