DoS by repeatedly sending unauthenticated requests for diff-files of a commit or merge request
HackerOne report #2650086 by a92847865
on 2024-08-11, assigned to @kmorrison1:
Report | Attachments | How To Reproduce
Report
NOTE! Thanks for submitting a report! Please note that initial triage is handled by HackerOne staff. They are identified with a
HackerOne triage
badge and will escalate to the GitLab team any. Please replace all the (parenthesized) sections below with the pertinent details. Remember, the more detail you provide, the easier it is for us to triage and respond quickly, so be sure to take your time filling out the report!
Summary
Diff_files of a commit or merge request can be requested using below endpoints:
- http(s)://[host-name]/[group]/[project]/-/commit/[:sha]/diff_files?expanded=1
- http(s)://[host-name]/[group]/[project]/merge_requests/[:id]/diffs_batch.json
Generating diff_files is CPU-time consuming task, especially for syntax highlighting part. There is a "time-out" (1.5 sec) which should be a sufficient protection against uncontrolled resource consumption.
def highlight_rich(text, continue: true)
tag = lexer.tag
tokens = lexer.lex(text, continue: continue) # [@]reporter: lack of time-out here
Gitlab::RenderTimeout.timeout { [@]formatter.format(tokens, **context, tag: tag).html_safe }
.....
end
But that protection unfortunately is kinda useless, for two reasons:
- Firstly,
Gitlab::RenderTimeout.timeout
only applies for formatting step, not applies for tokenization step. - Secondly, for diff-files requests (of a commit or merge request), there's currently no "time-out" protection for the whole process of generating several (many) diff-files.
Attack scenario
- Attacker looks for big commits/merge-requests of a public project. For example:
https://gitlab.com/gitlab-org/gitlab/-/merge_requests/32372/diffs_batch.json
. It takes Gitlab.com server ~17-18 seconds to respond to a request for "diffs_batch.json" file of that merge_request. - Attacker repeatedly sends unauthenticated requests to keep all CPUs of Gitlab instance being busy --> unreachable for legitimate users.
- The number of malicious requests per second is low since it takes tens of seconds to handle a malicious request. For example: 1 RPS (5% peak load) for a "1k-users Gitlab EE instance" (peak load: 20 RPS for normal requests)
Steps to reproduce
(Step-by-step guide to reproduce the issue, including:)
- Register a "1k-users Gitlab" VM (8 vCPU, 16GB RAM). Install latest Gitlab EE.
- Create a public project "DoS" in public group "Test".
- Extract this
and create a commit to add 21 extracted files to project "DoS".
- Open Web Developer Tools, click to view above commit, replace necessary fields in below URL:
http(s)://[host-name]/[group]/[project]/-/commit/[:sha]/diff_files?expanded=1.
- Log-out of your account, paste above URL in a new tab (with necessarily replaced fields), 6. Open Web Developer Tools, click enter. Then get equivalent curl command by right click on relevant networking item in networking tab of Web Developer Tools.
- Use this shell script
to send malicious requests to Gitlab instance. Replace: [url] and curl command headers with data you collected from steps 4-6.
- SSH into "1k-users Gitlab" VM. You should see that all CPUs are at ~ 100% utilized rate.
- Open your browser to connect to Gitlab instance. Gitlab instance is unusable.
- Stop sending malicious requests. It takes some minutes for Gitlab instance to respond normally to legitimate users.
Impact
Attacker can make a Gitlab public instance like Gitlab.com unreachable to legitimate users by repeatedly sending unauthenticated requests for diff-files
Examples
(If the bug is project related, please create an example project and export it using the project export feature)
(If you are using an older version of GitLab, this will also help determine whether the bug has been fixed in a more recent version)
(If the bug can be reproduced on GitLab.com without violating the Rules of Engagement
as outlined in the program policy, please provide the full path to the project.)
What is the current bug behavior?
(What actually happens, include relevant screenshots, API results, or complete HTTP requests)
There's no "time-out" protection for below endpoints:
- http(s)://[host-name]/[group]/[project]/-/commit/[:sha]/diff_files?expanded=1
- http(s)://[host-name]/[group]/[project]/merge_requests/[:id]/diffs_batch.json
What is the expected correct behavior?
(What you should see instead, include relevant screenshots, API results, or complete HTTP requests)
Set "time-out" protection for below endpoints:
- http(s)://[host-name]/[group]/[project]/-/commit/[:sha]/diff_files?expanded=1
- http(s)://[host-name]/[group]/[project]/merge_requests/[:id]/diffs_batch.json
Relevant logs and/or screenshots
(Paste any relevant logs - please use code blocks (```) to format console output,
logs, and code as it's very hard to read otherwise.)
Output of checks
This bug happens on both Gitlab self-managed and GitLab.com
Results of GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:env:info
)
(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production
)
System information
System: Ubuntu 20.04
Proxy: no
Current User: git
Using RVM: no
Ruby Version: 3.1.5p253
Gem Version: 3.5.11
Bundler Version:2.5.11
Rake Version: 13.0.6
Redis Version: 7.0.15
Sidekiq Version:7.1.6
Go Version: unknown
GitLab information
Version: 17.2.2-ee
Revision: 49db6c4d6ac
Directory: /opt/gitlab/embedded/service/gitlab-rails
DB Adapter: PostgreSQL
DB Version: 14.11
URL: http://34.68.164.218
HTTP Clone URL: http://34.68.164.218/some-group/some-project.git
SSH Clone URL: git@34.68.164.218:some-group/some-project.git
Elasticsearch: no
Geo: no
Using LDAP: no
Using Omniauth: yes
Omniauth Providers:
GitLab Shell
Version: 14.37.0
Repository storages:
- default: unix:/var/opt/gitlab/gitaly/gitaly.socket
GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell
Gitaly
- default Address: unix:/var/opt/gitlab/gitaly/gitaly.socket
- default Version: 17.2.2
- default Git Version: 2.45.2
PoC video:
Impact
Attacker can make a Gitlab public instance like Gitlab.com unreachable to legitimate users by repeatedly sending unauthenticated requests for diff-files (which takes tens of seconds of CPU time to generate) of a commit or merge request on a public repository. The number of malicious requests for taking a "1k-users Gitlab EE" instance is low (~1RPS)
Attachments
Warning: Attachments received through HackerOne, please exercise caution!
How To Reproduce
Please add reproducibility information to this section:
DRI's summary, notes, and resources
Summary
- per this comment this issue will be the SSOT for addressing the groupsource code
diff_files
aspect
Resources
- A previous issue mentions our current handling of this was temporary and we may be able to use something different now.
Notes
- Per this comment, it has been confirmed locally that it takes "significant time to render
diff_files?expanded=1
endpoint."
Tracing code path from controller to highlight
-
Projects::CommitController#diff_files
- defines
@diffs
which are instantiated indefine_commit_vars
@diffs = commit.diffs(opts)
- they are passed to
render template: 'projects/commit/diff_files'
- defines
-
projects/commit/diff_files
processes the diffs into diff_files-
diff_files = conditionally_paginate_diff_files(diffs, paginate: true, page: params[:page], per: Projects::CommitController::COMMIT_DIFFS_PER_PAGE)
-
conditionally_paginate_diff_files
returnsdiffs.diff_files
-
diff.diff_files
appears to return a collection ofGitlab::Diff::File.new
-
so in
diff_files
we pass a collection onGitlab::Diff::File
s viarender partial: 'projects/diffs/file', collection: diff_files, as: :diff_file
-
in
app/views/projects/diffs/_file.html.haml
we= render 'projects/diffs/content', diff_file: diff_file
-
in
= render 'projects/diffs/content', diff_file: diff_file
all paths end up renderingprojects/diffs/viewer
- before rendering though, part of it's logic is figuring out the viewer, which gets used in the next partial
-
projects/diffs/viewer
rendersviewer.partial_path, viewer: viewer
-
I think in the case we are working on
app/views/projects/diffs/viewers/_text.html.haml
is used. -
It renders
= render "projects/diffs/text_file", diff_file: diff_file, total_lines: total_lines
-
app/views/projects/diffs/_text_file.html.haml
does this:- diff_file.highlighted_diff_lines.each do |line| - line_code = diff_file.line_code(line)
-
that's defined in
lib/gitlab/diff/file.rb
def highlighted_diff_lines @highlighted_diff_lines ||= Gitlab::Diff::Highlight.new(self, repository: self.repository).highlight end
-
highlight
is defined inlib/gitlab/diff/highlight.rb
def highlight populate_marker_ranges @diff_lines.map do |diff_line| diff_line = diff_line.dup # ignore highlighting for "match" lines next diff_line if diff_line.meta? rich_line = apply_syntax_highlight(diff_line) # This is the key line rich_line = apply_marker_ranges_highlight(diff_line, rich_line) diff_line.rich_text = rich_line diff_line end end
apply_syntax_highlight
is also defined inlib/gitlab/diff/highlight.rb
def apply_syntax_highlight(diff_line) highlight_line(diff_line) || ERB::Util.html_escape(diff_line.text) end
def highlight_line(diff_line) return unless diff_file && diff_file.diff_refs return diff_line_highlighting(diff_line, plain: true) if blobs_too_large? if Feature.enabled?(:diff_line_syntax_highlighting, project) diff_line_highlighting(diff_line) else blob_highlighting(diff_line) end end
def diff_line_highlighting(diff_line, plain: false) rich_line = syntax_highlighter(diff_line).highlight( # Notice we're calling highlight diff_line.text(prefix: false), plain: plain, context: { line_number: diff_line.line }, used_on: :diff ) # Only update text if line is found. This will prevent # issues with submodules given the line only exists in diff content. if rich_line line_prefix = diff_line.text =~ PREFIX_REGEXP ? Regexp.last_match(1) : ' ' rich_line.prepend(line_prefix).concat("\n") end end
-
So what was the
syntax_highlighter.highlight
? Turns out that'sGitlab::Highlight
, which is where our investigation started.def syntax_highlighter(diff_line) path = diff_line.removed? ? diff_file.old_path : diff_file.new_path @syntax_highlighter ||= {} @syntax_highlighter[path] ||= Gitlab::Highlight.new( # Back to where we started investigating path, @raw_lines, language: repository&.gitattribute(path, 'gitlab-language') ) end
The only slightly fuzzy part for me is the last couple steps of the partials. There's a lot of conditional logic in the templates controlling things, but I think this is right.
-
-