Mitigate against hanging specs caused by Rails load_interlock_aware_monitor deadlock

Due to https://github.com/rails/rails/issues/45994, we see hanging specs.

NOTE: This is now fixed upstream in https://github.com/rails/rails/pull/46661/commits/ea549392986bcaa5546a61404a015b135b39a1a1 but won't be released until Rails 7.1.

Until we upgrade to Rails 7.1, we need to ....

The symptoms are:

  • Specs stuck on CI, times out at 90 minutes (or whatever the timeout threshold is)

  • Reproducible locally with :

    export GITLAB_TEST_EAGER_LOAD=true
    export CACHE_CLASSES=true
    bundle exec rspec -f d <spec>
  • When kill -CONT <pid of hanging rspec> is run, we see references to:

    /Users/tkuah/.rbenv/versions/2.7.5/lib/ruby/gems/2.7.0/gems/activesupport-6.1.6.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:17:in `block in mon_enter'
    ...
    /Users/tkuah/code/gdk-ee/gitlab/spec/features/protected_branches_spec.rb:10:in `block (2 levels) in <top (required)>'
    • Where the hanging line is a FactoryBot creation line.
    • Some preceding line is a request that causes an async request like gitlab_enable_admin_mode_sign_in
    • The asyc request + the Factory bot creation creates a deadlock
  • Mitigation is always the same. We sprinkle wait_for_requests to ensure async requests never overlap with FactoryBot creation

Proposal

  1. Automatic detect such situations
  2. Fail early if deadlock detected
  3. Fix the actual deadlock (nice to have)

Links

Edited by Thong Kuah