Split brain might lead to data loss

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Summary

When the database is in a situation where two nodes think they are the primary server, it could happen that a user creates a repository on one primary database and gets an ID for the project X. A second user can create a project while connected to the second primary database and also get the ID X. Given GitLab now uses hashed storage, which is based on the project_id column, the repository path is not always unique. But when a backup is inconsistent, for example the database is more recent than the Git repository snapshot, these situations can happen too.

The project repository is an obvious case where this would happen; but there are others. For example the PoolRepository model has the same naming scheme and also depends on a unique ID. Other assets I can think of that depend on uniqueness of keys are:

  1. Project Logos
  2. Notes attachments
  3. CI Artifacts

Although most probably there are more.

Possible fixes

Add a random nonce to the project ID before hashing the path, and storing the path in the database. Or don't use a ID as base of the hashed storage path, but an UUID.

/cc @jacobvosmaer-gitlab @nick.thomas

Edited by 🤖 GitLab Bot 🤖