Injection: GitLab Code Navigation relies solely on input validation of HTML by untrusted sources

Summary

Code Navigation is a feature that gives us "jump to definition" in files. It is gated by the :code_navigation feature flag. WIP docs are here: !26069 (merged)

While reviewing gitlab-workhorse!492 (diffs, comment 344333731) with @igor.drozdov , I noticed that we're doing input validation of values, escaping them, and outputting them into raw HTML that is then shown directly from the file by the frontend.

This merge request simply internalises code to workhorse that is already running within CI jobs - which is to say, the input validator is itself untrusted. At the moment, CI jobs are generating HTML fragments and packaging them into JSON files in the artifact.zip file we upload to GitLab. With the feature flag on, those fragments are being extracted from the artifact.zip and injected directly into the frontend, leading to XSS, SSRF, etc, vulnerabilities.

Steps to reproduce

Fork a project with code intelligence enabled
Modify .gitlab-ci.yml to generate malicious code-navigation data in the artifact.zip file
Create a merge request
Send the link to a project member

The frontend gets a link to a .json file inside the artifact.zip from the backend: https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/code_navigation_path.rb#L19 ; the frontend GETs that file, and interpolates the content of some of the items directly into the page, e.g. here: https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/assets/javascripts/code_navigation/components/popover.vue#L86

UNTRUSTED CONTENT

Example Project

Not actually tested, this is dead reckoning at the moment

What is the current bug behavior?

Untrusted HTML, JS, etc, is rendered/executed.

What is the expected correct behavior?

Escaped malicious content, or better yet, nothing at all.

Output of checks

This bug happens on GitLab.com

Possible fixes

The workhorse MR at least moves the processing step to code we control and can (in theory) trust. However, I think we rely entirely on input validation at the moment, which is a worry. There's always a chance people will get in artifact.zip files that haven't been through this step, or they

Instead, we should be able to apply some form of output validation. If we keep the architecture where the artifact.zip contains raw HTML, this is fairly difficult. We could run it through the Banzai pipeline, perhaps?

Alternatively, we could rework this feature so it provides some sort of metadata that is converted to HTML in either the frontend or the backend, so we don't need to inject raw HTML directly into the page.

/cc @igor.drozdov @danielgruesso @gitlab-com/gl-security/appsec

Edited Dec 07, 2020 by 🤖 GitLab Bot 🤖