Reverse lookups of hashed repository storage paths
"Hashed storage" is an ongoing migration in GitLab where we move from storing git repositories in directories with a mutable name (namespace/project.git
) to an immutable name (ab/cd/randomstring.git
).
I had a look the other day and I think there is something missing in how we do hashed storage now: there appears to be no easy way to do a reverse lookup, i.e. to find a GitLab project given its hashed storage path. We compute the hashed storage path from the project (SQL) ID using a one-way hash function.
I think we should store the hashed storage path for each project in SQL so we can do easy reverse lookups. If we don't this it is really going to bite us at some point.
Implementation steps
-
https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/23143 When project is migrated to hashed storage, or project is created when hashed storage is enabled, store the path in the database -
https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/23482 Write migration to backfill project_repositories
for projects on hashed storage
These items were moved to the related issue
[ ] Ensure thedisk_path
is also written and updated for projects on legacy storage[ ] Write migration to backfillproject_repositories
for projects on legacy storage