Hashed storage negatively affects repository archives
Summary
Noted while writing tests to do with https://gitlab.com/gitlab-org/gitlab-ce/issues/45689
When hashed storage is enabled, repository.name
contains the last 38 characters of the hashed project. Consider these gitlab-workhorse git-archvie
params:
Legacy:
{
"RepoPath"=>"/home/lupine/dev/gitlab.com/gitlab-org/gitlab-development-kit/gitlab/tmp/tests/repositories/namespace1/project1.git",
"ArchivePrefix"=>"project1-master-b83d6e391c22777fca1ed3012fce84f633d7fed0",
"ArchivePath"=>"/home/lupine/dev/gitlab.com/gitlab-org/gitlab-development-kit/gitlab/shared/cache/archive/project1.git/project1-master-b83d6e391c22777fca1ed3012fce84f633d7fed0.tar.gz",
"CommitId"=>"b83d6e391c22777fca1ed3012fce84f633d7fed0",
"GitalyServer"=>{"address"=>"unix:tmp/tests/gitaly/gitaly.socket", "token"=>"secret"},
"GitalyRepository"=>{"storage_name"=>"default", "relative_path"=>"namespace1/project1.git", "git_object_directory"=>"", "git_alternate_object_directories"=>[], "gl_repository"=>"project-1"},
"DisableCache"=>true
}
Hashed:
{
"RepoPath"=>"/home/lupine/dev/gitlab.com/gitlab-org/gitlab-development-kit/gitlab/tmp/tests/repositories/@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git",
"ArchivePrefix"=>"6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b-master-b83d6e391c22777fca1ed3012fce84f633d7fed0",
"ArchivePath"=>"/home/lupine/dev/gitlab.com/gitlab-org/gitlab-development-kit/gitlab/shared/cache/archive/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b-master-b83d6e391c22777fca1ed3012fce84f633d7fed0.tar.gz",
"CommitId"=>"b83d6e391c22777fca1ed3012fce84f633d7fed0",
"GitalyServer"=>{"address"=>"unix:tmp/tests/gitaly/gitaly.socket", "token"=>"secret"},
"GitalyRepository"=>{"storage_name"=>"default", "relative_path"=>"@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git", "git_object_directory"=>"", "git_alternate_object_directories"=>[], "gl_repository"=>"project-1"},
"DisableCache"=>true
}
Steps to reproduce
- Create a hashed storage project
- Navigate to Project -> Repository -> Tags
- Download an archive of the project, observe the filename
What is the current bug behavior?
Since we repository.name
to generate ArchivePath
, and workhorse uses file.Base(ArchivePath)
to decide which filename the client gets, the hashed form of the project ID leaks to the user as an archive named 6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b-<ref>-<sha>.tar.gz
- https://gitlab.com/gitlab-org/gitlab-ce/blob/master/lib/gitlab/git/repository.rb#L405
- https://gitlab.com/gitlab-org/gitlab-workhorse/blob/master/internal/git/archive.go#L67
What is the expected correct behavior?
We should see a filename more like project1-<ref>-<sha>.tar.gz
Possible fixes
We need to ensure we're using "project name" rather than "repository name" everywhere the value might leak to users or be otherwise exposed.
/cc @jarv @jramsay @stanhu we should delay migrating projects to hashed storage on GitLab.com until we've addressed this specific issue and audited the codebase to ensure it doesn't crop up in any other places.