Encountering `Projects::UpdateRepositoryStorageService::SameFilesystemError` when excluding `destination_storage_name` parameter

Summary

Encountering Projects::UpdateRepositoryStorageService::SameFilesystemError when excluding destination_storage_name parameter when invoking a POST request to the projects/<project_id>/repository_storage_moves API.

Source: Sentry.gitlab.net

Steps to reproduce

In the staging.gitlab.com environment, simply invoke a POST request to the projects/<project_id>/repository_storage_moves API to move a repository without specifying the destination_storage_name parameter. A new error will regularly show up in https://sentry.gitlab.net/gitlab/staginggitlabcom/issues. It may take two or three attempts, due to the random nature of the selection routine.

Example Project

Any project will suffice.

What is the current bug behavior?

When invoking the API to move a repository to another shard, if a destination is not specified, the UpdateRepositoryStorageService or ProjectUpdateRepositoryStorageWorker will select a destination shard automatically (based on assigned weights).

The algorithm which makes this selection has repeatedly demonstrated in the staging environment that it will randomly-ish select the origin or source shard if a repository is being moved from a shard which has sufficient weighting (>0) to be considered as a destination.

What is the expected correct behavior?

The destination selection algorithm should simply exclude the pre-migration project.repository_storage from the shard pool from which a destination selection is made.

Relevant logs and/or screenshots

https://sentry.gitlab.net/gitlab/staginggitlabcom/issues/2407090/?query=is%3Aunresolved

Output of checks

This bug happens on GitLab.com.

Results of GitLab environment info

Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of:
`sudo gitlab-rake gitlab:env:info`)

(For installations from source run and paste the output of:
`sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)

Results of GitLab application Check

Expand for output related to the GitLab application check

(For installations with omnibus-gitlab package run and paste the output of: sudo gitlab-rake gitlab:check SANITIZE=true)

(For installations from source run and paste the output of: sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)

(we will only investigate if the tests are passing)

Possible fixes

There might be a gotcha here, since UpdateRepositoryStorageService#same_filesystem? compares the Gitlab::GitalyClient.filesystem_id of the given storage shard parameters. Care should be taken to ensure that the selection algorithm also use the filesystem_id for destination candidates when comparing them against the source shard.

A simple filter ought to be sufficient. Something like the following: candidates_sans_source = candidate_destination_shards.reject_if { |destination_candidate_shard| Gitlab::GitalyClient.filesystem_id(destination_candidate_shard) == Gitlab::GitalyClient.filesystem_id(repository_storage_move.project.respository_storage) }

The implementation found in the latter: normalized_repository_storage_weights.max_by { |_, weight| rand**(1.0 / weight) }.first appears to present a challenge to the approach of filtering by a normalization on filesystem_id, since the pick_repository_storage routine is ignoring the application setting key. The filter operation may have to target the values for the normalized_repository_storage_weights mapping, but that could be complicated since the filtering apparatus described in the proposed above relies on methods (like Gitlab::GitalyClient#filesystem_id) and a repository_storage_move.project.respository_storage parameter which are not available to ApplicationSettingImplementation#pick_repository_storage.