Geo::RepositorySyncWorker should attempt to sync all projects
What does this MR do?
If some projects fail to sync, Geo::RepositySyncWorker should not take the same set of projects over and over again. But try to sync all of them.
Problem analysis
First iteration
#find_project_ids_not_synced
returns a set of projects, e.g. A
, B
, and C
.
#find_project_ids_updated_recently
returns nothing, because the registry is empty.
So A
, B
, and C
are attempted to sync, but fail, and are added to the registry.
Second and following iterations
#find_project_ids_not_synced
returns the same set of projects because they are ordered by last_repository_updated_at
, so: A
, B
, C
.
#find_project_ids_updated_recently
returns A
, B
, and C
, because those are the only projects present in the registry.
These set are interleaved and #uniq
ed, resulting in A
, B
, and C
. This happens over and over again.
Proposed solution
diff --git a/app/workers/geo/repository_sync_worker.rb b/app/workers/geo/repository_sync_worker.rb
index 9aa0bec00f..f5a3c591cc 100644
--- a/app/workers/geo/repository_sync_worker.rb
+++ b/app/workers/geo/repository_sync_worker.rb
@@ -24,7 +24,7 @@ module Geo
def find_project_ids_not_synced
current_node.projects
- .where.not(id: Geo::ProjectRegistry.synced.pluck(:project_id))
+ .where.not(id: Geo::ProjectRegistry.pluck(:project_id))
.order(last_repository_updated_at: :desc)
.limit(db_retrieve_batch_size)
.pluck(:id)
So #find_project_ids_not_synced
will only return projects that are not in the registry.
#find_project_ids_updated_recently
already returns projects that are in the registry.
These are interleaved, and slowly all projects will get attempted to sync and added to the registry.
Pitfall
Due to the interleaving, every iteration projects that failed in a previous iteration are retried (those returned by #find_project_ids_updated_recently
). So this will half the capacity to sync projects that are not attempted already.
Does this MR meet the acceptance criteria?
-
Changelog entry added, if necessary - [-] Documentation created/updated
- [-] API support added
- Tests
-
Added for this feature/bug -
All builds are passing
-
What are the relevant issue numbers?
Closes gitlab-org/gitlab-ee#3259
Merge request reports
Activity
mentioned in issue #3259 (closed)
- Resolved by Toon Claes
- Resolved by Toon Claes
- Resolved by Toon Claes
- Resolved by Toon Claes
added 115 commits
-
c4441805...3701599d - 113 commits from branch
master
- ee8c49ff - Geo::RepositySyncWorker should attempt to sync all projects
- 68d450d1 - Tune the spec to test the bare minimum
-
c4441805...3701599d - 113 commits from branch
added 1 commit
- 4090c39f - fixup! Geo::RepositySyncWorker should attempt to sync all projects
marked as a Work In Progress from 4090c39f
marked the checklist item Changelog entry added, if necessary as completed
marked the checklist item Documentation created/updated as completed
marked the checklist item Documentation created/updated as incomplete
@dbalexandre Can you have another look at the spec please?
assigned to @dbalexandre
Assigning this to %9.5 as discussed at https://gitlab.com/gitlab-org/gitlab-ee/issues/3259#note_39016954
added ~2278661b label
changed milestone to %9.5
- Resolved by Toon Claes
@to1ne thanks! I left one note.
assigned to @to1ne
added 1 commit
- 75a8059d - Explicitly test if every project id is being synced
@dbalexandre Please check again if the current spec implementation is more sturdy?
assigned to @dbalexandre
- Resolved by Toon Claes
assigned to @to1ne
- Resolved by Toon Claes
added 359 commits
-
75a8059d...71f09bb7 - 356 commits from branch
master
- 575b677f - Geo::RepositySyncWorker should attempt to sync all projects
- d3bf53e3 - Tune the spec to test the bare minimum
- df43dc0a - Explicitly test if every project id is being synced
Toggle commit list-
75a8059d...71f09bb7 - 356 commits from branch
@DouweM Can you take care of maintainer review?
assigned to @DouweM
@jarka Are you OK with ~"Pick into Stable" (see https://gitlab.com/gitlab-org/gitlab-ee/issues/3259#note_39016954)?
enabled an automatic merge when the pipeline for df43dc0a succeeds
mentioned in commit 89ca2d0e
removed ~2278661b label
changed milestone to %10.0
added devopssystems groupgeo labels
added Enterprise Edition label