Upgrade Google COS image to the newest LTS version
For guidance on the overall deprecations, removals and breaking changes workflow, please visit Breaking changes, deprecations, and removing features
Deprecation Summary
We're currently basing on COS 85 LTS. This LTS version has reached end of support at the end of September 2022. We should switch to the next LTS version - at this moment it's COS 105 LTS.
Looking on the documentation there should be no differences in the general concepts and architecture of the image, so the switch should be fairly easy.
Breaking Change
We're updating the operating system used by ephemeral VMs on which the jobs for SaaS Linux runners are executed. Among multiple updates there is one that causes known compatibility problem: update of Docker Engine from 19.03.15
to a way newer version 23.0.5
.
Affected runners
SaaS Linux runners with the following tags are using the old ephemeral VM OS version and will be affected by the update:
saas-linux-small-amd64
saas-linux-medium-amd64
saas-linux-large-amd64
saas-linux-xlarge-amd64
saas-linux-2xlarge-amd64
SaaS Linux runners with the following tags are using the already updated OS version and therefore will not be affected by the update:
- saas-linux-medium-amd64-gpu-standard`
SaaS MacOS and SaaS Windows runners are not affected by this change at all.
How to fix it?
Docker-in-Docker usage
When using Docker-in-Docker based jobs, host's Docker Engine in version 20.10 or newer causes problems when docker:dind
container in older versions are used. When trying to start docker:dind
container in version 19.03.15
on the host that uses Docker Engine 20.10 or newer, we get an error:
cgroups: cgroup mountpoint does not exist: unknown
The solution: update your jobs to use docker:dind
in version 20.10 or newer, for example the most recent 20.10.x version:
job:
services:
- docker:20.10.24-dind
image: docker:20.10.24
script:
- ...
As in all cases, we highly encourage to test and use the newest possible version and explicitly point it in the job definition. With that your jobs will not start failing randomly when updates of the images will be published.
Breaking change: once our update will be in place, all jobs that are using docker:dind
images older than 20.10.0
or newer will start failing with the DinD Service not starting due to the cgroups
error.
Kaniko usage
Another popular class of jobs that may be affected by our planned change are jobs using Kaniko to build container images.
When Kaniko version older than 1.11.0
is used, it is unable to detect the container runtime properly and fails with an error:
kaniko should only be run inside of a container, run with the --force flag if you are sure you want to continue
We've found that version 1.11.0
introduces changes in container runtime detection that works properly with Docker Engine 23.0.5
that we will update to on the ephemeral VMs.
The solution: update your jobs to use gcr.io/kaniko-project/executor
in version 1.11.0
or newer, for example:
job:
image: gcr.io/kaniko-project/executor:v1.11.0
script:
- ...
As in all cases, we highly encourage to test and use the newest possible version and explicitly point it in the job definition. With that your jobs will not start failing randomly when updates of the images will be published.
Breaking change: once our update will be in place, all jobs that are using gcr.io/kaniko-project/executor
images older than 1.11.0
or newer will start failing with the container detection error.
Affected Topology
Affected Tier
- Free
- Premium
- Ultimate
Checklists
Labels
-
This issue is labeled deprecation, and with the relevant ~devops::
,~group::
, and~Category:
labels. -
This issue is labeled breaking change if the removal of the deprecated item will be a breaking change.
Timeline
Please add links to the relevant merge requests.
-
As soon as possible, but no later than the third milestone preceding the major release (for example, given the following release schedule: 14.8, 14.9, 14.10, 15.0
–14.8
is the third milestone preceding the major release):-
A deprecation announcement entry has been created so the deprecation will appear in release posts and on the general deprecation page. -
Documentation has been updated to mark the feature as deprecated.
-
-
On or before the major milestone: A removal entry has been created so the removal will appear on the removals by milestones page and be announced in the release post. -
On the major milestone: -
The deprecated item has been removed. -
If the removal of the deprecated item is a breaking change, the merge request is labeled breaking change.
-
Mentions
-
Your stage's stable counterparts have been @mentioned
on this issue. @jfarmiloe @timtams @cfoster3 -
Your GPM has been @mentioned
so that they are aware of planned deprecations.
Deprecation Milestone
Planned Removal Milestone
TODO Checklist
-
update docker-machine on our SaaS Linux runners 👉 https://gitlab.com/gitlab-com/gl-infra/chef-repo/-/merge_requests/3466+s -
update image building and update image to newest Google COS LTS (milestone 105 at the time of writing) -
create packer
network ingitlab-ci-155816
GCP project👉 https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/merge_requests/5781 -
make https://gitlab.com/gitlab-org/ci-cd/shared-runners/images/gcp/linux-cos/ pipeline working
-
-
test upgrade on private
runners-
tag new version of https://gitlab.com/gitlab-org/ci-cd/shared-runners/images/gcp/linux-cos/ -
announce incoming update in #development
and#qualiy
. Inform that DinD configuration need to be updated to use at least version 20.10.24 ofdocker:*
container images (I'll draft the announcement message below) -
wait 1 week -
update private
runners to usegitlab-ci-155816/global/images/runners-cos-stable-v20230612-01
👉 https://gitlab.com/gitlab-com/gl-infra/chef-repo/-/merge_requests/3638 -
update shared-gitlab-org
runners to usegitlab-ci-155816/global/images/runners-cos-stable-v20230612-01
👉 https://gitlab.com/gitlab-com/gl-infra/chef-repo/-/merge_requests/3639
-
-
wait 2 weeks and check if there are other reported issues -
upgrade other SaaS Linux runners -
announce incoming update in a blog post; describe actions needed to be taken by users 👉 BLOG_POST_MR -
wait 1 month -
update other SaaS Linux shards to use the newest version of the image 👉 CHEF_REPO_MR
-
-
Update binfmt
initialization-
update the cookbook 👉 Draft: Update binfmt to newest version (gitlab-cookbooks/cookbook-wrapper-gitlab-runner!50 - closed) • Tomasz Maczukin -
upload cookbook to chef server 👉 NON_PRODUCTION_MR PRODUCTION_MR
-
Outdated
-
Setup dedicated service account for packer: -
Finish the MR for improved testing during our packer builds 👉 https://dev.gitlab.org/cookbooks/packer-runner-machines/-/merge_requests/43. -
merge gitlab-com/gl-infra/production#5814 (closed) (REVERTED) -
Update to COS 89 LTS (intermediate step)-
build new image👉 ~~~~~~~~~~~~https://dev.gitlab.org/cookbooks/packer-runner-machines/-/merge_requests/47 -
updateprivate
to usebeta
version again👉 https://gitlab.com/gitlab-com/gl-infra/chef-repo/-/merge_requests/1080 -
rollout new image toshared
andshared-gitlab-org
👉 CHEF_REPO_MR
-
-
Update to COS 93 LTS (when it'll get the updated Linux Kernel; probably at the beginning of March 2022) -
build new image and make private
using it automatically -
update private
to usebeta
version again -
tag new version of the image 👉 PACKER_REPO_TAG -
rollout new image to shared
andshared-gitlab-org
👉 CHEF_REPO_MR
-