Mitigate against hanging specs caused by Rails load_interlock_aware_monitor deadlock
Due to https://github.com/rails/rails/issues/45994, we see hanging specs.
NOTE: This is now fixed upstream in https://github.com/rails/rails/pull/46661/commits/ea549392986bcaa5546a61404a015b135b39a1a1 but won't be released until Rails 7.1.
Until we upgrade to Rails 7.1, we need to ....
The symptoms are:
-
Specs stuck on CI, times out at 90 minutes (or whatever the timeout threshold is)
-
Reproducible locally with :
export GITLAB_TEST_EAGER_LOAD=true export CACHE_CLASSES=true bundle exec rspec -f d <spec> -
When
kill -CONT <pid of hanging rspec>is run, we see references to:/Users/tkuah/.rbenv/versions/2.7.5/lib/ruby/gems/2.7.0/gems/activesupport-6.1.6.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:17:in `block in mon_enter' ... /Users/tkuah/code/gdk-ee/gitlab/spec/features/protected_branches_spec.rb:10:in `block (2 levels) in <top (required)>'- Where the hanging line is a
FactoryBotcreation line. - Some preceding line is a request that causes an async request like
gitlab_enable_admin_mode_sign_in - The asyc request + the Factory bot creation creates a deadlock
- Where the hanging line is a
-
Mitigation is always the same. We sprinkle
wait_for_requeststo ensure async requests never overlap with FactoryBot creation
Proposal
- Automatic detect such situations
- Fail early if deadlock detected
- Fix the actual deadlock (nice to have)
Links
Edited by Thong Kuah