Skip to content

All PoolRepositories have source_project_id: NULL

2019-04-18:

I think we can break the solution in two parts:

2019-04-16:

We used the wrong ActiveRecord association in app/models/pool_repository.rb: has_one :source_project should have been belongs_to. As a consequence, source_project_id is NULL in SQL on all rows in pool_repositories as of GitLab 11.10.

The method PoolRepository#source_project was not using the source_project_id column at all. It was just returning the "first" member project of the pool, according to the default SELECT order.

Anybody who has used the git deduplication feature (which is off by default) so far will have invalid SQL data in their database.


2019-04-15:

There is a race condition where a PoolRepository SQL record may end up pointing to the wrong "source project". It looks like this only happens when the first fork of a project is made while object dedup is enabled, and when the fork parent is large enough to slow down certain (unknown) sidekiq jobs to create the "right" timing.

It looks like this is because of the timing of when the ForkNetwork associate with the fork parent gets created. If the ForkNetwork creation is delayed, a fork may end up becoming the source project of the PoolRepository.

Reproduced via:

diff --git a/app/services/projects/fork_service.rb b/app/services/projects/fork_service.rb
index fc234bafc57..f1257455eee 100644
--- a/app/services/projects/fork_service.rb
+++ b/app/services/projects/fork_service.rb
@@ -71,7 +71,7 @@ module Projects
     end
 
     def fork_network
-      @fork_network ||= @project.fork_network || @project.build_root_of_fork_network
+      @fork_network ||= @project.fork_network || (sleep 10; @project.build_root_of_fork_network)
     end
 
     def build_fork_network_member(fork_to_project)

2019-04-12:

Observed during dedupe demo on 2019-04-12, the object pool was created and it contained a temporary packfile of the expected size, but because it was only a temporary file it can be used for deduplication.

Because we never fetch new projects into the pool, there is nothing that would automatically recover it.

It is probably a race condition for large repositories.

Links / references

Edited by Jacob Vosmaer
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information