Geo project/wiki sync shows no errors when Gitaly is down
Summary
During a Premium customer's Geo failover scenario (ZD - internal only), we noticed that after re-adding the initial node as a secondary, projects and wikis did not get replicated (and were 100% queued with no attempts). Manually trying to sync them showed a gitaly error, and once we fixed that, all projects got replicated, but there was no error and one could wait indefinitely for them to sync.
Steps to reproduce
Stop gitaly on the Geo secondary, create a new project on the primary - notice the project sync never gets scheduled and UI says "queued".
What is the current bug behavior?
Repositories don't get any sync attempt. UI also confusingly says "Queued" - when that's not the case (and can lead to investigating queuing problems - it did for us initially)
What is the expected correct behavior?
Repositories show as failed maybe, or some kind of error in the geo logs.
Possible fixes
I believe this happens because in logcursor, we only enqueue the job if the shard is healthy.
Designs
- Show closed items
- Issue#324228BacklogCategory:SAST GitLab Core GitLab Premium GitLab Ultimate [deprecated] Accepting merge requests backend devops application security testing group static analysis section sec type feature
- Epicgitlab-org#130501219Feb 10 – Sep 13, 2024Category:SAST devops application security testing feature consolidation group static analysis section sec type feature
- Issue#439046BacklogCategory:SAST backend customer devops application security testing group static analysis section sec
- Issue#425084BacklogCategory:SAST devops application security testing group static analysis section sec type feature workflow planning breakdown
- Issue#373117515.9Category:SAST Deliverable Track Health Status [DEPRECATED] devops application security testing feature enhancement group static analysis section sec type feature workflow complete
- Issue#36295816.0Category:SAST Deliverable GitLab Free GitLab Premium GitLab Ultimate backend customer devops application security testing documentation group static analysis missed-deliverable missed:15.7 missed:15.8 section sec type feature workflow complete
- Issue#36284915.10Category:SAST Deliverable [deprecated] Accepting merge requests devops application security testing feature consolidation group static analysis section sec type feature workflow complete
- Issue#35266615.4Category:SAST GitLab Free GitLab Premium GitLab Ultimate backend devops application security testing documentation group static analysis missed:15.2 missed:15.3 section sec type feature
- Issue#34725815.4Category:SAST backend customer devops application security testing feature enhancement group static analysis section sec type feature workflow production
- Issue#335221BacklogCategory:SAST [deprecated] Accepting merge requests devops application security testing group static analysis maintenance workflow section sec type maintenance
- Issue#33406514.02Category:SAST backend devops application security testing group static analysis section sec type maintenance workflow in dev
- Epicgitlab-org#544064Feb 18 – Apr 17, 2021Category:SAST devops application security testing group static analysis section sec type feature
- EpicClosedgitlab-org#56881013Jan 18 – Jun 17, 2021Category:SAST backend devops application security testing group static analysis section sec
- Issue#331801BacklogCategory:SAST [deprecated] Accepting merge requests backend devops application security testing feature enhancement group static analysis section sec type feature
- Issue#330578BacklogCategory:SAST Product Feedback SAST: New Scanner [deprecated] Accepting merge requests customer devops application security testing group static analysis section sec
- Epicgitlab-org#57971015Apr 18 – May 17, 2021Category:SAST [deprecated] Accepting merge requests backend devops application security testing feature enhancement group static analysis section sec type feature
- Issue#327236BacklogCategory:SAST [deprecated] Accepting merge requests backend devops application security testing feature enhancement group static analysis section sec type feature
- Issue#321204BacklogCategory:SAST [deprecated] Accepting merge requests backend devops application security testing group static analysis section sec type feature
- Issue#118496BacklogCategory:SAST SAST: Integrate customer devops application security testing group static analysis section sec type feature workflow start
- Issue#26206813.11Category:SAST Deliverable Discovery SAST: Integrate [deprecated] Accepting merge requests backend devops application security testing group static analysis missed-deliverable missed:13.10 missed:13.9 section sec type feature workflow planning breakdown
- IssueClosed#300486BacklogCategory:SAST [deprecated] Accepting merge requests auto updated backend devops application security testing group static analysis section sec type feature
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Maintainer
Thanks @cat!
When a shard is unhealthy, the https://gitlab.com/gitlab-org/gitlab/-/blob/695be8fc57b1b12f08d7c3def85349a9cbf82d64/ee/app/workers/geo/repository_shard_sync_worker.rb#L13 skips that shard. We should definitely add a geo.log line like
log_info("skipped scheduling repo syncs on unhealthy shard", { shard_name: shard_name })
there. 2 - Michael Kozono set weight to 0
set weight to 0
- Michael Kozono added [deprecated] Accepting merge requests label
added [deprecated] Accepting merge requests label
- 🤖 GitLab Bot 🤖 mentioned in issue gitlab-org/quality/triage-reports#2332 (closed)
mentioned in issue gitlab-org/quality/triage-reports#2332 (closed)
- 🤖 GitLab Bot 🤖 mentioned in issue gitlab-org/quality/triage-reports#2398 (closed)
mentioned in issue gitlab-org/quality/triage-reports#2398 (closed)
- 🤖 GitLab Bot 🤖 mentioned in issue gitlab-org/quality/triage-reports#2497 (closed)
mentioned in issue gitlab-org/quality/triage-reports#2497 (closed)
- Michael Kozono mentioned in merge request !58526 (merged)
mentioned in merge request !58526 (merged)
- Michael Kozono assigned to @mkozono
assigned to @mkozono
- Michael Kozono changed milestone to %13.11
changed milestone to %13.11
- Michael Kozono added workflowin review label
added workflowin review label
- 🤖 GitLab Bot 🤖 removed [deprecated] Accepting merge requests label
removed [deprecated] Accepting merge requests label
- Maintainer
Closing since !58526 (merged) was merged
- Michael Kozono closed
closed