Windows server 1803 out of disk space
Overview
All pipelines are failing, with similar failures to https://gitlab.com/gitlab-org/gitlab-runner/-/jobs/355978265 because the 1803 instance is out of space. This is causing a master:broken
Runbook
- Run
Get-PSDrive
Name Used (GB) Free (GB) Provider Root CurrentLocation ---- --------- --------- -------- ---- --------------- Alias Alias C 26.46 3.05 FileSystem C:\ Program Files\Docker Cert Certificate \ D 8.28 23.72 FileSystem D:\ docker E FileSystem E:\ Env Environment Function Function HKCU Registry HKEY_CURRENT_USER HKLM Registry HKEY_LOCAL_MACHINE Variable Variable WSMan WSMan
- Delete the
C:\GitLab-Runner\builds
directory, the issue still persisted -
docker system prune -a --volumes
0B where cleaned, which is expected, since we delete the image every time - Move the build directory to
D:\builds
D:\
is a temporary drive, issue improved but job still failed later on. - Find the biggest files inside of
C:\
withGet-ChildItem c:\ -r| sort -descending -property length | select -first 10 name, Length
Name Length ---- ------ log.txt 229590648 a8569eeaccd9075f_blobs.bin 161881159 0d7d44cf94f69cc014ee0add3e731e090c961901 78671232 ServiceFabric.cab 73505937 CbsPersist_20191115041139.log 73223532 dockerd.exe 70873160 docker.exe 64147016 67506f53943563db79c5afdf1342287fc0ee6a2f 62799744 01c621e45b5601931212d59a691b9c9ad8efb031 62360232 bbc831983b8f15e6f206fede201956e86555d662 61662384
- Trying to find the locatin of
log.txt
I just ran the same command withDirectoryName
C:\
withGet-ChildItem c:\ -r| sort -descending -property length | select -first 10 name, Length, DirectoryName
. This showed it wasgo build
&go test
cache, so removed it, this improved the situation, but still resulted into failing builds. - Moving the directory for docker images to
D:\docker
, following https://docs.microsoft.com/en-us/virtualization/windowscontainers/manage-docker/configure-docker-daemon{ "data-root": "d:\\docker" }
- Pipelines are stable again
- I've ran
Remove-Item C:\Windows\System32\config\systemprofile\AppData\Local\go-build
since we had a full disk again in https://gitlab.com/gitlab-org/gitlab-runner/-/jobs/375749624.go cache -clean
wasn't deleting everything.
Proposal
We should use autoscaler to build Docker images for 1803. We cant use the shared Runners because those images are built on Windows version 1903, and we need 1803 since Docker containers require the same OS version that for the host and guest. We need to build a custom image based on top of 1803
and register a new Runner private for this project. The same should be done for 1809
image.
These images should have the same kind of software the the custom executor requires, preferably similar to what we have in https://gitlab.com/gitlab-org/ci-cd/shared-runners/images/gcp/windows-containers image.