All PoolRepositories have source_project_id: NULL
2019-04-18:
I think we can break the solution in two parts:
-
Fix application code to ensure future PoolRepository records are correct https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/27464 -
Add a best-effort data migration that tries to fix PoolRepositories that were already created, by using the fork network root as the source. This may fail in which case the source project of the pool stays null. #1653 (closed)
2019-04-16:
We used the wrong ActiveRecord association in app/models/pool_repository.rb: has_one :source_project
should have been belongs_to
. As a consequence, source_project_id
is NULL
in SQL on all rows in pool_repositories
as of GitLab 11.10.
The method PoolRepository#source_project
was not using the source_project_id
column at all. It was just returning the "first" member project of the pool, according to the default SELECT order.
Anybody who has used the git deduplication feature (which is off by default) so far will have invalid SQL data in their database.
2019-04-15:
There is a race condition where a PoolRepository SQL record may end up pointing to the wrong "source project". It looks like this only happens when the first fork of a project is made while object dedup is enabled, and when the fork parent is large enough to slow down certain (unknown) sidekiq jobs to create the "right" timing.
It looks like this is because of the timing of when the ForkNetwork associate with the fork parent gets created. If the ForkNetwork creation is delayed, a fork may end up becoming the source project of the PoolRepository.
Reproduced via:
diff --git a/app/services/projects/fork_service.rb b/app/services/projects/fork_service.rb
index fc234bafc57..f1257455eee 100644
--- a/app/services/projects/fork_service.rb
+++ b/app/services/projects/fork_service.rb
@@ -71,7 +71,7 @@ module Projects
end
def fork_network
- @fork_network ||= @project.fork_network || @project.build_root_of_fork_network
+ @fork_network ||= @project.fork_network || (sleep 10; @project.build_root_of_fork_network)
end
def build_fork_network_member(fork_to_project)
2019-04-12:
Observed during dedupe demo on 2019-04-12, the object pool was created and it contained a temporary packfile of the expected size, but because it was only a temporary file it can be used for deduplication.
Because we never fetch new projects into the pool, there is nothing that would automatically recover it.
It is probably a race condition for large repositories.