Add replication test to confirm that jobs aren't left stalled in the replication queue
gitlab-org/gitaly#2801 (closed) reports an issue where replication wasn't working because the replication queue was growing but not being drained.
It looks like jobs were left in_progress
and were never cleared, which might have occurred because Praefect was shut down without completing the jobs in progress. See gitlab-org/gitaly#2801 (comment 347126779)
This might be difficult to reproduce in an E2E test because our typical repo is small. The test might need to create many large commits simultaneously so that Praefect can be killed before the replication queue is cleared while jobs are still in progress.
The omnibus-gitlab docker image that corresponds to the version deployed when gitlab-org/gitaly#2801 (comment 347039586) was reported is dev.gitlab.org:5005/gitlab/omnibus-gitlab/gitlab-ee:13.1.202005210940-f49d3365c6f.8b540082540
:
↳ docker run --rm dev.gitlab.org:5005/gitlab/omnibus-gitlab/gitlab-ee:13.1.202005210940-f49d3365c6f.8b540082540 bash -c '/opt/gitlab/embedded/bin/praefect --version'
Praefect, version 13.0.0-rc2-35-g322d655c
See also: gitlab-org/gitaly#2873 (closed)
Steps
- Push several commits to a repository.
- Stop the Praefect node while replication is still in progress.
- Restart the Praefect node.
- Push to a new project confirm that it's replicated.