Arbitrary read of files owned by the "git" user via malicious tar.gz file upload using GitLab export functionality

⚠ Please read the process on how to fix security issues before starting to work on the issue. Vulnerabilities must be fixed in a security mirror.

HackerOne report #2032730 by ubercomp on 2023-06-20, assigned to GitLab Team:

Report | Attachments | How To Reproduce

Report

Summary

During project imports from a Gitlab export, an arbitrary tarball is processed / extracted with the use of the GNU tar command, and the contents of the tarball are reflcted back to the user through the imported project. It's possible to read arbitrary files owned by the git user on the same partition as the one where the "tar" command is run via command_line_util.rb a tarball containing hard links.

Additional details

This type of issue, related to using tar on an arbitrary tarball and reflecting the contents back to the user, has a history on Gitlab, and the implementation is hardened. Specifically, it removes symbolic links after extracting, and even matches hidden and changes permissions beforehand, thwarting most attacks. However, Gitlab still extracts tarballs containing soft and hard links, and it's possible to devise an attack due to an unexpected behavior which I discovered on GNU tar as part of my research on Gitlab: when extracting a tarball to an empty directory, it is possible that one of the extracted files will be a hard link to a file outside the aforementioned empty directory.

Fix suggestion: The attack makes use of both soft and hard links, so inspecting the tarball with "tar tvvf" and refusing to process it further if it contains soft or hard links is an effective mitigation for this attack [note 1], as is ensuring that the tarball is extracted on a partition that doesn't contain any sensitive files, as hard links are not allowed to cross device boundaries.

I have privately reported the issue with tar to the GNU tar maintainers without mentioning any affected parties and they are working on a patch for it (which can be seen on the tar commit log on commits related to the apply_delayed_links function. However, I expect that it will be a while before all the distributions have pulled the patch from upstream and, since I found this issue because I was working on Gitlab, I decided to report even though an official patch is not yet available, as there are mitigations that avoid exploitability.

[note 1] however, note that tar is a complex binary format, and even doing something as innocuous as processing an archive with "tar tvvf" could technically expose Gitlab to future issues. The long-term solution would be to only use "tar" on arbitrary tarballs on sandboxes, e.g. using isolate or another sandbox of your choice.

Steps to reproduce

Proof of concept: leaking /var/opt/gitlab/.profile

Start a new omnibus Gitlab Instance (or use one if you already have one handy):

docker run --detach --name gitlab -it gitlab/gitlab-ce

Log in to that instance and import a Gitlab export by going to http://$IP/projects/new#import_project, clicking "Gitlab export" and uploading the attached exploit.tar.gz file which will leak the contents of /var/opt/gitlab/.profile through
Go to the Snippets of the project which was just imported, click on the first snippet, and then click on the "leak" link. See that its contents are the same as those of /var/opt/gitlab/.profile

Detecting an exploit tarball

Running "tar tvvf" on the exploit tarball reveals the following structure:

###  redacted for brevity  
....  
drwxrwxr-x rjs/rjs           0 2023-06-20 16:52 empty_dir/  
lrwxrwxrwx rjs/rjs           0 2023-06-20 16:52 sym -> empty_dir  
lrwxrwxrwx rjs/rjs           0 2023-06-20 16:52 sym/.gitconfig -> /anything  
hrw-rw-r-- rjs/rjs           0 2023-06-20 16:52 ./uploads/b7a485e3ba9f2f0d975e046cb53cfe69/leak.jpg link to sym/.gitconfig  
lrwxrwxrwx rjs/rjs           0 2023-06-20 16:52 sym -> /var/opt/gitlab

See that there are three soft links (lines beginning with "l") and a hard link (lines beginning with "h") , plus the lengths are all zero. A tarball with an attack like the one I'm describing here will always have those, and a tarball without one will not. Running

tar -tvvf "$tarball_path"

and refusing to run if the result matches '^(h|l)' should be an OK short term fix and be left as defense in depth just in case.

Leaking a different file

I have attached the pristine.tar (which is just a regular exploit file except that the uploads files are removed) and attack.sh files. pristine.tar is a regular export from Gitlab, but I removed the entry for the uploaded file, which will be added later by the attack. You run "bash attack.sh" to generate the attack part, concatenate 'pristine.tar' with 'attack.tar' and then gzip the result.

bash attack.sh FILE_YOU_WANT_TO_LEAK  PATH_RELATIVE_TO_PRISTINE

For example, to leak /var/opt/gitlab/.gitconfig, you would run

bash attack.sh /var/opt/gitlab/.gitconfig ./uploads/b7a485e3ba9f2f0d975e046cb53cfe69/leak.jpg # this generates attack.tar  
cp pristine.tar exploit.tar  
tar --concatenate -f exploit.tar attack.tar  
gzip exploit.tar # this generates exploit.tar.gz => UPLOAD THIS FILE

What is the expected correct behavior?

Arbitrary files should not be linked. Instead, the import process should result in error as the tar file contains soft and hard links.

Results of GitLab environment info

System information
System:
Current User: git
Using RVM: no
Ruby Version: 3.0.6p216
Gem Version: 3.4.13
Bundler Version:2.4.13
Rake Version: 13.0.6
Redis Version: 6.2.11
Sidekiq Version:6.5.7
Go Version: unknown

GitLab information
Version: 16.0.5
Revision: 6e840c5468f
Directory: /opt/gitlab/embedded/service/gitlab-rails
DB Adapter: PostgreSQL
DB Version: 13.11
URL: http://172.17.0.2
HTTP Clone URL: http://172.17.0.2/some-group/some-project.git
SSH Clone URL: git@172.17.0.2:some-group/some-project.git
Using LDAP: no
Using Omniauth: yes
Omniauth Providers:

GitLab Shell
Version: 14.20.0
Repository storages:

default: unix:/var/opt/gitlab/gitaly/gitaly.socket
GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell

Impact

It's possible to read any files owned or writable by the "git" user or the "git" group on the same partition as the one where the "tar" command runs. In particular, on a single-machine deployment, it's possible to read:

repositories => /var/opt/gitlab/git-data/repositories/[@]hashed
gitlab-rails files => /var/opt/gitlab/gitlab-rails including uploads, lfs storage, cache, ci_secure_files, encrypted_settings, external-diffs, cached exports, etc.
/var/gitlab-exporter/gitlab-exporter.yml => this is the only "interesting" file on my instance that is owned by git. However, YMMV, and, ideally you should run "find / -type f -user git" on an actually used instance to evaluate impact.

My accessment on impact is that while this attack does not compromise the secrets files/escalates easily to RCE on most instances, it compromises the confidentiality of everything the instance is there to protect, hence I classified it as a critical, which is consistent with previous reports of arbitrary file disclosures

Note on leaking repositories

Since repositories are stored on predictable locations related to the project id, all one needs to know to be able to leak a specific repository is its project id., which is numeric. For instance, Gitlab' s project id on gitlab.com is 278964, that means that its storage is at /var/opt/gitlab/git-data/repositories/[@]hashed/a6/80/a68072e80f075e89bc74a300101a9e71e8363bdb542182580162553462480a52.git/ , since sha256("278964") is a68072e80f075e89bc74a300101a9e71e8363bdb542182580162553462480a52 . The prefix will be /var/opt/gitlab/git-data/repositories/[@]hashed, the subdirectories will be the first two and the subsequent two characters of the hash, and the root of the repository will be a directory whose name is the whole hash concatenated with ".git" .

Attachments

Warning: Attachments received through HackerOne, please exercise caution!

How To Reproduce

Please add reproducibility information to this section:

Fix

Check for hardlinks anywhere .symlink? is checked. E.g see https://gitlab.com/gitlab-org/security/gitlab/-/merge_requests/3352#note_1445005658
- Ideally create a single method that does both
- Ideally create a rubocop for whereever .symlink? is called that will alert devs to use the new (sym | hard)link? method
Update secure coding guidelines

Edited Jul 20, 2023 by Luke Duncalfe