Construct git archive cache filename using a cryptographic hash to prevent collisions
When a user requests an archive of a repository, the request is handed off to Workhorse, in most cases. Workhorse uses a simple filesystem cache, with the path provided by the Rails app. The cache filename is determined by Gitlab::Git::Repsitory#archive_metadata (called by Gitlab::Workhorse#send_git_archive). #archive_metadata embeds the various parameters into the cache filename to ensure that any parameter change that alters the produced archive also alters the cache filename.
However, this strategy makes it unnecessarily complicated to add additional parameters to #send_git_archive. Specifically, solving #223577 (closed) requires support for the prefix, exclude, and elide_path parameters of Gitaly's RepositoryService.GetArchiveRequest, and encoding those parameters into the cache filename is ugly.
I propose to instead determine the cache filename by hashing a JSON map of the relevant parameters:
def archive_metadata(ref, storage_path, project_path, format = "tar.gz", append_sha:, path: nil)
ref ||= root_ref
commit = Gitlab::Git::Commit.find(self, ref)
return {} if commit.nil?
prefix = archive_prefix(ref, commit.id, project_path, append_sha: append_sha, path: path)
filename = Digest::SHA2.hexdigest({
# all Gitaly parameters, with few exceptions, must be included here to
# ensure that any parameter change causes a change to Workhorse's
# cache filename
commit_id: commit.id,
prefix: prefix,
path: path
}.to_json)
{
'ArchivePrefix' => prefix,
'ArchivePath' => archive_file_path(storage_path, commit.id, filename, format),
'CommitId' => commit.id,
'GitalyRepository' => gitaly_repository.to_h
}
end
# old path: .../project-1/ddd0f15ae83993f5cb66a927a28673882e99100b/@v2/gitlab-test-master.zip
# new path: .../project-1/ddd0f15ae83993f5cb66a927a28673882e99100b/@v2/1ecfb60d2359d78f28bee3095fcd130e8c41b7a03e7d0a7d7e82b850fbf897fb.zip