Skip to content
GitLab
Next
    • GitLab: the DevOps platform
    • Explore GitLab
    • Install GitLab
    • How GitLab compares
    • Get started
    • GitLab docs
    • GitLab Learn
  • Pricing
  • Talk to an expert
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
    Projects Groups Snippets
  • Sign up now
  • Login
  • Sign in / Register
  • GitLab GitLab
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 46,781
    • Issues 46,781
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 1,539
    • Merge requests 1,539
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Artifacts
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • GitLab.orgGitLab.org
  • GitLabGitLab
  • Issues
  • #257497
Closed
Open
Issue created Sep 27, 2020 by Dominic Couture@dcoutureDeveloper

Regular Expression Denial of Service in Elastic search results

Summary

The code in https://gitlab.com/gitlab-org/gitlab/-/blob/26962cded7d44900f7aab0d93b7095e3d518e1bb/ee/lib/gitlab/elastic/search_results.rb#L121 uses a regular expression to strip off the extension from file paths returned in search results.

      def self.parse_search_result(result, project)
        ref = result["_source"]["blob"]["commit_sha"]
        path = result["_source"]["blob"]["path"]
        extname = File.extname(path)
        basename = path.sub(/#{extname}$/, '')

This basically allows a user to control the regular expression being ran and create a regex with catastrophic backtracking on purpose.

Steps to reproduce

It goes without saying that we should not test this on prod.

  1. If it isn't done already, enable Elasticsearch integration
  2. Create a project
  3. Create a file named aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab.(a+)+$ with content SEARCH_ME_REGEX_DOS_ISSUE
  4. Search for SEARCH_ME_REGEX_DOS_ISSUE
  5. Request times out and fails with 500 error, process uses 100% CPU

Performance on my self-hosted GitLab was degraded and spamming a few of those requests could probably do some damage to uptime.

Here's a mocked example of what happens

# Faked result
path = '/group/project/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab.(a+)+$'

# Code from https://gitlab.com/gitlab-org/gitlab/-/blob/26962cded7d44900f7aab0d93b7095e3d518e1bb/ee/lib/gitlab/elastic/search_results.rb#L120-121
extname = File.extname(path)
basename = path.sub(/#{extname}$/, '')

This code never returns.

What is the current bug behavior?

DoS caused by catastrophic backtracking of the regular expression

What is the expected correct behavior?

Extension being stripped without performance issues

Output of checks

Found in the master branch, probably happens on GitLab.com but I didn't test for obvious reasons.

Possible fixes

diff --git a/ee/lib/gitlab/elastic/search_results.rb b/ee/lib/gitlab/elastic/search_results.rb
index dc8da0e2e05..cb0c0e0fd11 100644
--- a/ee/lib/gitlab/elastic/search_results.rb
+++ b/ee/lib/gitlab/elastic/search_results.rb
@@ -117,8 +117,7 @@ module Gitlab
       def self.parse_search_result(result, project)
         ref = result["_source"]["blob"]["commit_sha"]
         path = result["_source"]["blob"]["path"]
-        extname = File.extname(path)
-        basename = path.sub(/#{extname}$/, '')
+        basename = File.join(File.dirname(path), File.basename(path, File.extname(path)))
         content = result["_source"]["blob"]["content"]
         project_id = result['_source']['project_id'].to_i
         total_lines = content.lines.size

If we must use the regex solution for whatever reason, then the regex should be constructed like so /#{Regexp.escape(extname)}$/.

Edited Sep 30, 2020 by Dominic Couture
Assignee
Assign to
Time tracking