Regular Expression Denial of Service in Elastic search results
Summary
The code in https://gitlab.com/gitlab-org/gitlab/-/blob/26962cded7d44900f7aab0d93b7095e3d518e1bb/ee/lib/gitlab/elastic/search_results.rb#L121 uses a regular expression to strip off the extension from file paths returned in search results.
def self.parse_search_result(result, project)
ref = result["_source"]["blob"]["commit_sha"]
path = result["_source"]["blob"]["path"]
extname = File.extname(path)
basename = path.sub(/#{extname}$/, '')
This basically allows a user to control the regular expression being ran and create a regex with catastrophic backtracking on purpose.
Steps to reproduce
It goes without saying that we should not test this on prod.
- If it isn't done already, enable Elasticsearch integration
- Create a project
- Create a file named
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab.(a+)+$
with contentSEARCH_ME_REGEX_DOS_ISSUE
- Search for
SEARCH_ME_REGEX_DOS_ISSUE
- Request times out and fails with 500 error, process uses 100% CPU
Performance on my self-hosted GitLab was degraded and spamming a few of those requests could probably do some damage to uptime.
Here's a mocked example of what happens
# Faked result
path = '/group/project/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab.(a+)+$'
# Code from https://gitlab.com/gitlab-org/gitlab/-/blob/26962cded7d44900f7aab0d93b7095e3d518e1bb/ee/lib/gitlab/elastic/search_results.rb#L120-121
extname = File.extname(path)
basename = path.sub(/#{extname}$/, '')
This code never returns.
What is the current bug behavior?
DoS caused by catastrophic backtracking of the regular expression
What is the expected correct behavior?
Extension being stripped without performance issues
Output of checks
Found in the master branch, probably happens on GitLab.com but I didn't test for obvious reasons.
Possible fixes
diff --git a/ee/lib/gitlab/elastic/search_results.rb b/ee/lib/gitlab/elastic/search_results.rb
index dc8da0e2e05..cb0c0e0fd11 100644
--- a/ee/lib/gitlab/elastic/search_results.rb
+++ b/ee/lib/gitlab/elastic/search_results.rb
@@ -117,8 +117,7 @@ module Gitlab
def self.parse_search_result(result, project)
ref = result["_source"]["blob"]["commit_sha"]
path = result["_source"]["blob"]["path"]
- extname = File.extname(path)
- basename = path.sub(/#{extname}$/, '')
+ basename = File.join(File.dirname(path), File.basename(path, File.extname(path)))
content = result["_source"]["blob"]["content"]
project_id = result['_source']['project_id'].to_i
total_lines = content.lines.size
If we must use the regex solution for whatever reason, then the regex should be constructed like so /#{Regexp.escape(extname)}$/
.