Windows server 1803 out of disk space

Overview

All pipelines are failing, with similar failures to https://gitlab.com/gitlab-org/gitlab-runner/-/jobs/355978265 because the 1803 instance is out of space. This is causing a master:broken

Runbook

  • Run Get-PSDrive
    Name           Used (GB)     Free (GB) Provider      Root                                                                                                                                                       CurrentLocation
    ----           ---------     --------- --------      ----                                                                                                                                                       ---------------
    Alias                                  Alias
    C                   26.46          3.05 FileSystem    C:\                                                                                                                                                   Program Files\Docker
    Cert                                   Certificate   \
    D                   8.28         23.72 FileSystem    D:\                                                                                                                                                                 docker
    E                                      FileSystem    E:\
    Env                                    Environment
    Function                               Function
    HKCU                                   Registry      HKEY_CURRENT_USER
    HKLM                                   Registry      HKEY_LOCAL_MACHINE
    Variable                               Variable
    WSMan                                  WSMan
  • Delete the C:\GitLab-Runner\builds directory, the issue still persisted
  • docker system prune -a --volumes 0B where cleaned, which is expected, since we delete the image every time
  • Move the build directory to D:\builds D:\ is a temporary drive, issue improved but job still failed later on.
  • Find the biggest files inside of C:\ with Get-ChildItem c:\ -r| sort -descending -property length | select -first 10 name, Length
    Name                                        Length
    ----                                        ------
    log.txt                                  229590648
    a8569eeaccd9075f_blobs.bin               161881159
    0d7d44cf94f69cc014ee0add3e731e090c961901  78671232
    ServiceFabric.cab                         73505937
    CbsPersist_20191115041139.log             73223532
    dockerd.exe                               70873160
    docker.exe                                64147016
    67506f53943563db79c5afdf1342287fc0ee6a2f  62799744
    01c621e45b5601931212d59a691b9c9ad8efb031  62360232
    bbc831983b8f15e6f206fede201956e86555d662  61662384
  • Trying to find the locatin of log.txt I just ran the same command with DirectoryName C:\ with Get-ChildItem c:\ -r| sort -descending -property length | select -first 10 name, Length, DirectoryName. This showed it was go build & go test cache, so removed it, this improved the situation, but still resulted into failing builds.
  • Moving the directory for docker images to D:\docker, following https://docs.microsoft.com/en-us/virtualization/windowscontainers/manage-docker/configure-docker-daemon
    {    
      "data-root": "d:\\docker"
    }
  • Pipelines are stable again
  • I've ran Remove-Item C:\Windows\System32\config\systemprofile\AppData\Local\go-build since we had a full disk again in https://gitlab.com/gitlab-org/gitlab-runner/-/jobs/375749624. go cache -clean wasn't deleting everything.

Proposal

We should use autoscaler to build Docker images for 1803. We cant use the shared Runners because those images are built on Windows version 1903, and we need 1803 since Docker containers require the same OS version that for the host and guest. We need to build a custom image based on top of 1803 and register a new Runner private for this project. The same should be done for 1809 image.

These images should have the same kind of software the the custom executor requires, preferably similar to what we have in https://gitlab.com/gitlab-org/ci-cd/shared-runners/images/gcp/windows-containers image.

Merge Requests