Invalid characters in git repo filenames and contents cause problems in projects, wikis, and snippets
Summary
(UPDATE NOTE: This issue has evolved to represent a larger scope involving various issues related to invalid and/or non-displayable characters in file names and contents. See the Possible fixes
section below for more details)
For git repo files with a file name containing an invalid character in it causes failures in the GitLab Web UI.
"Invalid character" means one which cannot be successfully encoded to UTF-8 via EncodingHelper.encode!
in Gitlab::GitalyClient::CommitService#ls_files
.
The invalid character in the filename can be created by using ANSI-C Quoting with an 8-bit octal character code to inject an invalid, non-printable ASCII character into the filename.
The character used in the example is the following, but other invalid characters may also exhibit the same behavior:
decimal: 144 octal: 220 hex: 90 binary: 10010000
The file can only be created on Linux filesystems (tested on Ubuntu in VirtualBox). On MacOS, you get the error Illegal byte sequence
when attempting to create the file.
This bug affects multiple areas of the product which have data backed by git repositories which can be cloned, including wikis, snippets, and projects.
It also causes subsequent clones/checkouts of the repo on a MacOS system, because MacOS will not even allow the file to be created, thus the git command fails with an illegal byte sequence
error.
This is the behavior across different features which are exposed to this bug (which allow local cloning of the underlying repo):
- wikis: Broken, 500 error page
- snippets: Broken, you just get a spinner and the snippet does not load. (Note: this has since been addressed by an MR to handle the symptom of files being nil: !54552 (merged))
- projects: Broken (in various places if you navigate to the file, e.g. the commit in the history, displaying the file info in the main project overview, etc). (An example: https://gitlab.com/jay_mccure/bad_chars)
- Any attempt on MacOS to clone a branch of any of these repos containing the invalid filename in the HEAD of the branch.
Original Security Issue
This was originally reported as a denial of service security issue (confidential issue).
However, it was reclassified as a normal bug, because only authenticated, authorized project members with push access to the repo can commit and push these invalid files.
If such an authenticated, authorized project member had malicious intent, they could just as easily delete some or all the files in the repo, or likely do any number of other much more destructive/disruptive actions.
Therefore, there does not seem to be anything about this bug which could be exploited by unauthenticated or unauthorized/guest users in order commit a denial of service.
Who is affected by this bug?
Since these filenames are by definition invalid, non-printable characters, there seems to be no common reason to use them.
The most likely use case seems to be by security researchers or penetration/vulnerability testers, who might want to commit such invalid filenames to repos as examples or test data.
Root Cause
- We allow files with invalid filenames to be committed to Gitlab repos (project, snippet, or wiki repos).
- "invalid filenames" means any filename which cannot be successfully encoded to
UTF-8
viaGitlab::EncodingHelper#encode
. For example, a file created viaecho 'bad' > $'\220'
on Linux, which results innil
being returned fromGitlab::EncodingHelper#encode
.
- "invalid filenames" means any filename which cannot be successfully encoded to
- These invalid filenames which are encoded to
nil
break all usages ofGitlab::GitalyClient::CommitService#ls_files
, because it returns an array containing nil entries. - This in turn causes various errors in other areas of the GitLab web app, when they try to process the file list containing these unexpected
nil
entries, and causes clones of the repo to fail on MacOS due toIllegal byte sequence
. - There are potentially other security exploits which can leverage this behavior, which we've worked around in various ways, but it would be good to address the root cause in a holistic way.
Workarounds
All of these can be mitigated by deleting the file from HEAD of the branch in the repo as describe here, but there may still be lingering effects (e.g. navigating to the original commit containing the bad filename even after the file has been deleted).
Alternately, these files can be hidden in the UI if they are not valid/relevant to be displayed, as was done in #349382 (closed).
Steps to reproduce
Primary Bug Behavior:
- Use a Linux OS. You can install Ubuntu Linux under VirtualBox on a mac to reproduce this if you don't have access to a Linux system.
- Clone the git repo of a project, wiki, or snippet locally
- Run
echo 'bad character' > $'\220'
on a linux system (this is an invalid command on MacOS, which does not allow such a file to be created). - Push to the repository
-
/-/wiki/*
now responds with 500 error (or other errors described above for projects or snippets).
Possible fixes
After discussion (see this thread), it is clear that it is not possible to prevent these filenames from getting created in repos. For example, GitHub supports them, and git itself supports them, and GitLab supports importing GitHub (and other standalone) repos.
Therefore, the proper fix seems to be something similar to GitHub's approach, where the invalid characters in the filename are gracefully handled and replaced with some other character whenever they must be processed or displayed in the web UI.
However, this is a larger issue that requires a holistic approach.
Currently, it's unclear at what point in the GitLab codebase this needs to be handled. More investigation is needed.
This has been discussed with the Gitaly team in their office hours, and the consensus was that this should be fixed on the Ruby/Rails consumer/presentation side, or perhaps in the API layer somewhere, but not in Gitaly itself, which should support these types of filenames the same way native git does.
That would be a large effort, because there's currently no clearly abstracted common place to do this, there are unknowns (e.g., which set of characters needs to get replaced or not?), and the care that would need to be taken to ensure there's no regressions, the potential for introducing performance bottlenecks, etc.
It also is not strictly limited to file names - there are other issues involved with displaying problematic characters as file contents as well. Encoding::UndefinedConversionError in highlight_cache.rb
And there could be a range of problematic characters. Some may be completely invalid, as described in this issue, but there may be others that are valid but not a correct encoding.
A fix would also involve product involvement to determine the impact on various areas of the product. E.g. the handling may differ depending on the use case. Some features may want to just hide the invalid files. Others may want to substitute the characters but still display them. Still others may need to preserve the original file contents and a mapping to the "cleaned" version for subsequent processing (e.g. in diff views, or in WYSIWYG markdown rendering).
See this thread for some code examples of potential parts of this fix: #349382 (comment 825355197)
Related Issues and MRs
- Original security issue: https://gitlab.com/gitlab-org/gitlab/-/issues/299133
- MR to address bug in snippets (by handling symptom of files being nil): !54552 (merged)
- Security issue (confidential): #349382 (closed)
- Encoding::UndefinedConversionError in highlight_cache.rb