Gitlab runner removes docker machine CoreOS hosts during reboot after upgrade
Summary
CoreOS automatically upgrades itself. Our provider (OVH) has some old CoreOS version, so every time gitlab-runner creates a new host, it automatically gets updates. However, after the update, CoreOS reboots itself. During this reboot, the Gitlab Runner removes the hosts with the reason "machine is unavailable". This causes no jobs to be finished and causes the runner to keep removing and creating hosts.
Steps to reproduce
- Configure Gitlab Runner with docker machine, where an out-of-date CoreOS image is being used
- Start a job that takes a little while (longer than the auto-upgrade to be triggered, downloaded, installed and rebooted), takes 5 - 10 minutes.
- Notice Gitlab Runner detecting the host as unavailable, and removing it
Actual behavior
- Gitlab Runner removes the host during the reboot of CoreOS.
Expected behavior
- Gitlab Runner should either:
- Wait longer before marking a machine unavailable
- or: disable auto upgrades
Relevant logs and/or screenshots
Gitlab runner:
Apr 23 21:05:34 gunterstein gitlab-runner[13569]: #033[33mWARN#033[0m[1724] Requesting machine removal #033[33mcreated#033[0m=15m37.874765654s #033[33mname#033[0m=runner-a40bbaca-runner-1524509396-4294d226 #033[33mnow#033[0m="2018-04-23 21:05:34.253741317 +0200 CEST" #033[33mreason#033[0m="machine is unavailable" #033[33mused#033[0m=17.653991827s #033[33musedCount#033[0m=1
Apr 23 21:05:34 gunterstein gitlab-runner[13569]: #033[33mWARN#033[0m[1724] Stopping machine #033[33mcreated#033[0m=15m37.907040621s #033[33mname#033[0m=runner-a40bbaca-runner-1524509396-4294d226 #033[33mreason#033[0m="machine is unavailable" #033[33mused#033[0m=32.055576ms #033[33musedCount#033[0m=1
Apr 23 21:05:34 gunterstein gitlab-runner[13569]: #033[36mINFO#033[0m[1724] Stopping "runner-a40bbaca-runner-1524509396-4294d226"... #033[36mname#033[0m=runner-a40bbaca-runner-1524509396-4294d226 #033[36moperation#033[0m=stop
Apr 23 21:05:58 gunterstein gitlab-runner[13569]: #033[36mINFO#033[0m[1748] Machine "runner-a40bbaca-runner-1524509396-4294d226" was stopped. #033[36mname#033[0m=runner-a40bbaca-runner-1524509396-4294d226 #033[36moperation#033[0m=stop
Apr 23 21:05:58 gunterstein gitlab-runner[13569]: #033[33mWARN#033[0m[1748] Removing machine #033[33mcreated#033[0m=16m2.265874049s #033[33mname#033[0m=runner-a40bbaca-runner-1524509396-4294d226 #033[33mreason#033[0m="machine is unavailable" #033[33mused#033[0m=24.390888666s #033[33musedCount#033[0m=1
Apr 23 21:05:58 gunterstein gitlab-runner[13569]: #033[36mINFO#033[0m[1748] About to remove runner-a40bbaca-runner-1524509396-4294d226 #033[36mname#033[0m=runner-a40bbaca-runner-1524509396-4294d226 #033[36moperation#033[0m=remove
Apr 23 21:05:58 gunterstein gitlab-runner[13569]: #033[36mINFO#033[0m[1748] WARNING: This action will delete both local reference and remote instance. #033[36mname#033[0m=runner-a40bbaca-runner-1524509396-4294d226 #033[36moperation#033[0m=remove
Apr 23 21:05:58 gunterstein gitlab-runner[13569]: #033[36mINFO#033[0m[1748] (runner-a40bbaca-runner-1524509396-4294d226) Deleting OpenStack instance... #033[36mname#033[0m=runner-a40bbaca-runner-1524509396-4294d226 #033[36moperation#033[0m=remove
Apr 23 21:05:59 gunterstein gitlab-runner[13569]: #033[36mINFO#033[0m[1749] Successfully removed runner-a40bbaca-runner-1524509396-4294d226 #033[36mname#033[0m=runner-a40bbaca-runner-1524509396-4294d226 #033[36moperation#033[0m=remove
Apr 23 21:05:59 gunterstein gitlab-runner[13569]: #033[36mINFO#033[0m[1749] Machine removed #033[36mcreated#033[0m=16m3.023672044s #033[36mname#033[0m=runner-a40bbaca-runner-1524509396-4294d226 #033[36mnow#033[0m="2018-04-23 21:05:59.402655176 +0200 CEST" #033[36mreason#033[0m="machine is unavailable" #033[36mretries#033[0m=0 #033[36mused#033[0m=25.148686766s #033[36musedCount#033[0m=1
CoreOS (partial):
Apr 23 18:58:43 runner-a40bbaca-runner-1524509396-4294d226 update_engine[752]: I0423 18:58:43.995472 752 omaha_request_action.cc:245] Posting an Omaha request to https://public.update.core-os.net/v1/update/
Apr 23 18:59:02 runner-a40bbaca-runner-1524509396-4294d226 update_engine[752]: I0423 18:59:02.320557 752 install_plan.cc:53] InstallPlan: , new_update, url: https://update.release.core-os.net/amd64-usr/1688.5.3/update.gz, payload size: 359881084, payload hash: vzIrXJjRRfjruMCXBjlQFxK/amQ9CSY79rLTkqEgAXM=, partition_path: /dev/vda4, kernel_path: /boot/coreos/vmlinuz-b, pcr_policy_path: /var/lib/update_engine/pcrs-b.zip, old_partition_path: /dev/vda3, old_kernel_path: /boot/coreos/vmlinuz-a
Apr 23 19:00:33 runner-a40bbaca-runner-1524509396-4294d226 update_engine[752]: I0423 19:00:33.457816 752 update_attempter.cc:290] Processing Done.
Apr 23 19:00:33 runner-a40bbaca-runner-1524509396-4294d226 update_engine[752]: I0423 19:00:33.459444 752 update_attempter.cc:316] Update successfully applied, waiting to reboot.
Environment description
Custom installation. Gitlab 10.6.3. docker info:
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 1
Server Version: 17.05.0-ce
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 7
Dirperm1 Supported: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: 949e6fa
Security Options:
apparmor
Kernel Version: 3.13.0-144-generic
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31.33GiB
Name: --redacted--
ID: HM2I:ODKX:5YTH:BE2O:YLRP:Q4NF:C56Q:DWN5:73XR:KM3H:LDTY:OKYG
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
docker-machine version 0.12.2, build 9371605
Used GitLab Runner version
Version: 10.7.0
Git revision: 7c273476
Git branch: 10-7-stable
GO version: go1.8.7
Built: 2018-04-22T13:43:35+00:00
OS/Arch: linux/amd64