Skip to content

cmd/praefect: Fix flake when testing removal of replication events

When deleting a repository we also try to remove all of its in-flight replication events. The test that exercises this logic is flaky though because it does not correctly synchronize execution of the loop that handles removal of the replication events and acknowledging a ready replication job. This may cause the removal logic to still be running while we're acknowledging the job, which then leads to the job being removed without us expecting that to happen. This then causes a deadlock because the loop exits, but we still expect the ticker to be called:

goroutine 2169 [chan send, 19 minutes]:
gitlab.com/gitlab-org/gitaly/v15/internal/helper.(*ManualTicker).Tick(...)
        /builds/gitlab-org/gitaly/internal/helper/ticker.go:60
gitlab.com/gitlab-org/gitaly/v15/cmd/praefect.TestRemoveRepository_removeReplicationEvents.func1()
        /builds/gitlab-org/gitaly/cmd/praefect/subcmd_remove_repository_test.go:367 +0x7e5
created by gitlab.com/gitlab-org/gitaly/v15/cmd/praefect.TestRemoveRepository_removeReplicationEvents
        /builds/gitlab-org/gitaly/cmd/praefect/subcmd_remove_repository_test.go:346 +0x10f6

goroutine 94 [semacquire, 19 minutes]:
sync.runtime_Semacquire(0xc001a85fc8?)
        /usr/local/go/src/runtime/sema.go:62 +0x25
sync.(*WaitGroup).Wait(0xc001a85fc0)
        /usr/local/go/src/sync/waitgroup.go:139 +0xa6
gitlab.com/gitlab-org/gitaly/v15/cmd/praefect.TestRemoveRepository_removeReplicationEvents(0xc003c7bba0)
        /builds/gitlab-org/gitaly/cmd/praefect/subcmd_remove_repository_test.go:373 +0x143a
testing.tRunner(0xc003c7bba0, 0x26ae808)
        /usr/local/go/src/testing/testing.go:1446 +0x217
created by testing.(*T).Run
        /usr/local/go/src/testing/testing.go:1493 +0x75e

Fix this by explicitly synchronizing the loop via the Reset() function of the ticker.

Merge request reports