Specially crafted docker images can exhaust resources on managers

Summary

For the Docker executor, when we run a job's main container, we copy the containers output directly to the trace log implementation. This implementation has a limit on how much data is stored.

!2534 (merged) introduced a user package that executes id inside of the user's provided container. This is used to fetch the containers uid and gid and allows us to chown files created by our helper image, whose files will also be created with uid 0.

Unfortunately, no such limit is placed on the output from id, so a specially crafted docker image that replaces how id works can have it return an excessive amount of data to fill and exhaust a limitless buffer on the Runner manager.

This problem only exists when FF_DISABLE_UMASK_FOR_DOCKER_EXECUTOR is enabled, however, it can be enabled at the job level. A simple protection is to ensure FF_DISABLE_UMASK_FOR_DOCKER_EXECUTOR cannot be enabled until the problem is resolved.

A similar problem exists for Docker's service output, which fails its 30s health check. However, whilst there's no limit here (and probably should be), this command reads the Docker's log file output via a different API endpoint. This appears to have an internal limit of 1MB, because we don't follow the output.

Solution

Disable FF_DISABLE_UMASK_FOR_DOCKER_EXECUTOR on the Runner fleet to protect GitLab hosted Runner Managers: https://gitlab.com/gitlab-com/gl-infra/chef-repo/-/merge_requests/805/
Explicitly set sane limits on any command/data returned from containers that are not directly fed into the trace log.

Closes #28630 (closed)

Edited Dec 02, 2021 by Georgi N. Georgiev | GitLab