Skip to content

Rename "repository storages" to "shards" and make them first-class database citizens

Description

Gitlab can store repository data in many places on the filesystem. Currently, we call these places "repository storages", which is a cumbersome and unwieldy name.

The definitions are sourced from the configuration file, which means Gitlab needs a restart whenever a change is required - even to add a brand-new one to alleviate capacity problems. In Geo, it means the configuration needs to be synced manually between primary and secondary: https://gitlab.com/gitlab-org/gitlab-ee/issues/3243

Proposal

First, rename "repository storages" to "shards" everywhere. We already use this term in discussion and it's reasonably well-understood within the domain.

Next, remove the settings from gitlab.yml and introduce a shards database table to replace it. We can perform an initial data migration to fill the table with the contents of gitlab.yml, and ignore those keys thereafter.

Finally, modify application_settings.repository_storages and projects.repository_storage to be foreign keys on shards.id rather than being strings that correspond to shards.name. We'd need to create invalid shards in this step to handle cases where a shard not in the config file was referenced by these columns.

On top of this, we can place CRUD and monitoring functionality in the admin panel and API (/admin/shards and API /api/v4/shards toplevels).

This represents an increase in power in the admin panel that may be uncomfortable - if an admin account is compromised, the attacker could add arbitrary directories as shards to attempt to access the filesystem. To mitigate this, we could introduce a new gitlab.yml setting: shards_prefix: . If set and shards.path doesn't begin with the prefix, the administrator is not allowed to add the shard.

/cc @DouweM @mydigitalself

Links / references

Documentation blurb

Overview

What is it? Why should someone use this feature? What is the underlying (business) problem? How do you use this feature?

Use cases

Who is this for? Provide one or more use cases.

Feature checklist

Make sure these are completed before closing the issue, with a link to the relevant commit.