Rollout Google Container Optimized OS image replacement over CoreOS on our Runners
A successor of gitlab-org/ci-cd/docker-machine#14 (closed)
As we've added all needed product changes we are now able to replace our autoscaled VMs image based on the the deprecated (and no more present in GCP) CoreOS, with the image based on Google's Container Optimized OS (Google COS
further from here).
The image build definition change is being prepared at https://dev.gitlab.org/cookbooks/packer-runner-machines/-/merge_requests/40.
As per the thread at gitlab-org/ci-cd/docker-machine#14 (comment 450364955) we already use this image on our private-runners-manager-X.gitlab.com
runners.
This issue is to guide us with the incremental rollout through the rest of our fleet.
gitlab-shared-runners-manager-X.gitlab.com
deploy Google COS to -
change the SSH username from core
tocos
in theMachineOptions
section -
drop the engine-opt=mtu=1460
setting (this is already hardcoded by Google in the Container Optiized OS image, in Docker'sdaemon.json
file) -
drop the IPv6 related settings (these are already hardcoded by us in the Container Optiized OS image, in Docker's daemon.json
file) -
replace /dummy-sys-class-dmi-id:/sys/class/dmi/id:ro
with/tmp/dummy-sys-class-dmi-id:/sys/class/dmi/id:ro
(/tmp
is not a problem as most of our Runners are using the VM only once and even if they are re-used, we never restart the VM; the data must be persisted only for subsequent job executions not between VM restarts) -
change the google-machine-image
value togitlab-ci-155816/global/images/runners-cos-stable-swtich-to-google-cos
-
add google-metadata=cos-update-strategy=update_disabled
MachineOption -
add google-metadata-from-file=user-data=/etc/gitlab-runner/cloud-config.conf
MachineOption
decide if we're ready to do the full switch
-
make the decision
prepare for merging
-
tag gitlab-ci-155816/global/images/runners-cos-stable-swtich-to-google-cos
asgitlab-ci-155816/global/images/runners-cos-stable-beta
Done with GCP's console terminal:
gcloud compute images create runners-cos-stable-beta --source-image=runners-cos-stable-swtich-to-google-cos
-
update with the stable image name gitlab-ci-155816/global/images/runners-cos-stable-beta
➡ https://ops.gitlab.net/gitlab-cookbooks/chef-repo/-/merge_requests/5030-
private-runners-manage-X
-
gitlab-shared-runners-manager-X
-
merge the change in packer repository
shared-runners-manager-X.gitlab.com
deploy Google COS to -
prepare -
change the SSH username from core
tocos
in theMachineOptions
section -
drop the engine-opt=mtu=1460
setting (this is already hardcoded by Google in the Container Optiized OS image, in Docker'sdaemon.json
file) -
drop the IPv6 related settings (these are already hardcoded by us in the Container Optiized OS image, in Docker's daemon.json
file) -
replace /dummy-sys-class-dmi-id:/sys/class/dmi/id:ro
with/tmp/dummy-sys-class-dmi-id:/sys/class/dmi/id:ro
-
change the google-machine-image
value togitlab-ci-155816/global/images/runners-cos-stable-beta
-
add google-metadata=cos-update-strategy=update_disabled
MachineOption -
add google-metadata-from-file=user-data=/etc/gitlab-runner/cloud-config.v2.conf
MachineOption
-
-
Introduce the change through production#5184 (closed)
gitlab-docker-shared-runners-manager-X.gitlab.com
deploy Google COS to We'll do this at the very end as we've promissed gitlab-org/gitlab
development team. There is some difference in how GitLab Q&A pipeline is executed after switching from CoreOS to Google COS and the team wanted to have some more time to take a look at that.
-
prepare -
change the SSH username from core
tocos
in theMachineOptions
section -
drop the engine-opt=mtu=1460
setting (this is already hardcoded by Google in the Container Optiized OS image, in Docker'sdaemon.json
file) -
drop the IPv6 related settings (these are already hardcoded by us in the Container Optiized OS image, in Docker's daemon.json
file) -
replace /dummy-sys-class-dmi-id:/sys/class/dmi/id:ro
with/tmp/dummy-sys-class-dmi-id:/sys/class/dmi/id:ro
-
change the google-machine-image
value togitlab-ci-155816/global/images/runners-cos-stable-beta
-
add google-metadata=cos-update-strategy=update_disabled
MachineOption -
add google-metadata-from-file=user-data=/etc/gitlab-runner/cloud-config.v2.conf
MachineOption
-
-
Introduce the change through production#5258 (closed)
Regressions/Incidents caused during the rollout
- gsrm
- srm
- SWAP file missing
👉 production#5184 (comment 634094649) - Elasticsearch service fails to start
👉 production#5254 (closed)
- SWAP file missing