Windows server 1803 out of disk space
Overview
All pipelines are failing, with similar failures to https://gitlab.com/gitlab-org/gitlab-runner/-/jobs/355978265 because the 1803 instance is out of space. This is causing a master:broken
Runbook
- Run
Get-PSDriveName Used (GB) Free (GB) Provider Root CurrentLocation ---- --------- --------- -------- ---- --------------- Alias Alias C 26.46 3.05 FileSystem C:\ Program Files\Docker Cert Certificate \ D 8.28 23.72 FileSystem D:\ docker E FileSystem E:\ Env Environment Function Function HKCU Registry HKEY_CURRENT_USER HKLM Registry HKEY_LOCAL_MACHINE Variable Variable WSMan WSMan - Delete the
C:\GitLab-Runner\buildsdirectory, the issue still persisted -
docker system prune -a --volumes0B where cleaned, which is expected, since we delete the image every time - Move the build directory to
D:\buildsD:\is a temporary drive, issue improved but job still failed later on. - Find the biggest files inside of
C:\withGet-ChildItem c:\ -r| sort -descending -property length | select -first 10 name, LengthName Length ---- ------ log.txt 229590648 a8569eeaccd9075f_blobs.bin 161881159 0d7d44cf94f69cc014ee0add3e731e090c961901 78671232 ServiceFabric.cab 73505937 CbsPersist_20191115041139.log 73223532 dockerd.exe 70873160 docker.exe 64147016 67506f53943563db79c5afdf1342287fc0ee6a2f 62799744 01c621e45b5601931212d59a691b9c9ad8efb031 62360232 bbc831983b8f15e6f206fede201956e86555d662 61662384 - Trying to find the locatin of
log.txtI just ran the same command withDirectoryNameC:\withGet-ChildItem c:\ -r| sort -descending -property length | select -first 10 name, Length, DirectoryName. This showed it wasgo build&go testcache, so removed it, this improved the situation, but still resulted into failing builds. - Moving the directory for docker images to
D:\docker, following https://docs.microsoft.com/en-us/virtualization/windowscontainers/manage-docker/configure-docker-daemon{ "data-root": "d:\\docker" } - Pipelines are stable again
- I've ran
Remove-Item C:\Windows\System32\config\systemprofile\AppData\Local\go-buildsince we had a full disk again in https://gitlab.com/gitlab-org/gitlab-runner/-/jobs/375749624.go cache -cleanwasn't deleting everything.
Proposal
We should use autoscaler to build Docker images for 1803. We cant use the shared Runners because those images are built on Windows version 1903, and we need 1803 since Docker containers require the same OS version that for the host and guest. We need to build a custom image based on top of 1803 and register a new Runner private for this project. The same should be done for 1809 image.
These images should have the same kind of software the the custom executor requires, preferably similar to what we have in https://gitlab.com/gitlab-org/ci-cd/shared-runners/images/gcp/windows-containers image.