Failure in qa-schedules-subgroup-cleanup job - TCP connection timeout

Problem

Rake cleanup jobs for all Staging QA runs are failing today with failure to connect to staging.gitlab.com error after about 250 API calls to delete a subgroup.

 total_sub_groups: 372
 total_sub_group_pages: 4
 ==== Current Page: 1 ====
 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF.FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
 ==== Current Page: 2 ====
 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
 ==== Current Page: 3 ====
 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
 ERROR: Job failed: exit code 1

This may be due to Cloudflare based on #199279 (comment 278340697) - likely not based on no 524 response https://gitlab.slack.com/archives/CB3LSMEJV/p1580325752470200?thread_ts=1580317297.454700&cid=CB3LSMEJV

Impact

Blocking Release Deployer per #199279 (comment 278365579) - mitigated by gitlab-org/quality/pipeline-common!24 (merged)

Diagnosing steps

Example job: https://ops.gitlab.net/gitlab-org/quality/staging/-/jobs/870018

 $ bundle exec rake delete_subgroups GITLAB_ADDRESS=$GITLAB_ADDRESS
 rake aborted!
 RestClient::Exceptions::OpenTimeout: Timed out connecting to server
 /usr/local/bundle/gems/rest-client-2.0.2/lib/restclient/request.rb:739:in `rescue in transmit'
 /usr/local/bundle/gems/rest-client-2.0.2/lib/restclient/request.rb:642:in `transmit'
 /usr/local/bundle/gems/rest-client-2.0.2/lib/restclient/request.rb:145:in `execute'
 /usr/local/bundle/gems/rest-client-2.0.2/lib/restclient/request.rb:52:in `execute'
 /builds/gitlab-org/quality/staging/gitlab/qa/qa/support/api.rb:42:in `delete'
 /builds/gitlab-org/quality/staging/gitlab/qa/qa/tools/delete_subgroups.rb:54:in `block in delete_subgroups'
 /builds/gitlab-org/quality/staging/gitlab/qa/qa/tools/delete_subgroups.rb:53:in `each'
 /builds/gitlab-org/quality/staging/gitlab/qa/qa/tools/delete_subgroups.rb:53:in `delete_subgroups'
 /builds/gitlab-org/quality/staging/gitlab/qa/qa/tools/delete_subgroups.rb:44:in `block in run'
 /builds/gitlab-org/quality/staging/gitlab/qa/qa/tools/delete_subgroups.rb:35:in `times'
 /builds/gitlab-org/quality/staging/gitlab/qa/qa/tools/delete_subgroups.rb:35:in `run'
 /builds/gitlab-org/quality/staging/gitlab/qa/Rakefile:12:in `block in <top (required)>'
 /usr/local/bundle/gems/rake-12.3.0/exe/rake:27:in `<top (required)>'
 /usr/local/bin/bundle:23:in `load'
 /usr/local/bin/bundle:23:in `<main>'
 Caused by:
 Errno::ETIMEDOUT: Failed to open TCP connection to staging.gitlab.com:443 (Operation timed out - connect(2) for "staging.gitlab.com" port 443)
 /usr/local/bundle/gems/rest-client-2.0.2/lib/restclient/request.rb:715:in `transmit'
 /usr/local/bundle/gems/rest-client-2.0.2/lib/restclient/request.rb:145:in `execute'
 /usr/local/bundle/gems/rest-client-2.0.2/lib/restclient/request.rb:52:in `execute'
 /builds/gitlab-org/quality/staging/gitlab/qa/qa/support/api.rb:42:in `delete'
 /builds/gitlab-org/quality/staging/gitlab/qa/qa/tools/delete_subgroups.rb:54:in `block in delete_subgroups'
 /builds/gitlab-org/quality/staging/gitlab/qa/qa/tools/delete_subgroups.rb:53:in `each'
 /builds/gitlab-org/quality/staging/gitlab/qa/qa/tools/delete_subgroups.rb:53:in `delete_subgroups'
 /builds/gitlab-org/quality/staging/gitlab/qa/qa/tools/delete_subgroups.rb:44:in `block in run'
 /builds/gitlab-org/quality/staging/gitlab/qa/qa/tools/delete_subgroups.rb:35:in `times'
 /builds/gitlab-org/quality/staging/gitlab/qa/qa/tools/delete_subgroups.rb:35:in `run'
 /builds/gitlab-org/quality/staging/gitlab/qa/Rakefile:12:in `block in <top (required)>'
 /usr/local/bundle/gems/rake-12.3.0/exe/rake:27:in `<top (required)>'
 /usr/local/bin/bundle:23:in `load'
 /usr/local/bin/bundle:23:in `<main>'
 Caused by:
 Errno::ETIMEDOUT: Operation timed out - connect(2) for "staging.gitlab.com" port 443
 /usr/local/bundle/gems/rest-client-2.0.2/lib/restclient/request.rb:715:in `transmit'
 /usr/local/bundle/gems/rest-client-2.0.2/lib/restclient/request.rb:145:in `execute'
 /usr/local/bundle/gems/rest-client-2.0.2/lib/restclient/request.rb:52:in `execute'
 /builds/gitlab-org/quality/staging/gitlab/qa/qa/support/api.rb:42:in `delete'
 /builds/gitlab-org/quality/staging/gitlab/qa/qa/tools/delete_subgroups.rb:54:in `block in delete_subgroups'
 /builds/gitlab-org/quality/staging/gitlab/qa/qa/tools/delete_subgroups.rb:53:in `each'
 /builds/gitlab-org/quality/staging/gitlab/qa/qa/tools/delete_subgroups.rb:53:in `delete_subgroups'
 /builds/gitlab-org/quality/staging/gitlab/qa/qa/tools/delete_subgroups.rb:44:in `block in run'
 /builds/gitlab-org/quality/staging/gitlab/qa/qa/tools/delete_subgroups.rb:35:in `times'
 /builds/gitlab-org/quality/staging/gitlab/qa/qa/tools/delete_subgroups.rb:35:in `run'
 /builds/gitlab-org/quality/staging/gitlab/qa/Rakefile:12:in `block in <top (required)>'
 /usr/local/bundle/gems/rake-12.3.0/exe/rake:27:in `<top (required)>'
 /usr/local/bin/bundle:23:in `load'
 /usr/local/bin/bundle:23:in `<main>'
 Tasks: TOP => delete_subgroups
 (See full trace by running task with --trace)
 Running...
 total_sub_groups: 372
 total_sub_group_pages: 4
 ==== Current Page: 1 ====
 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF.FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
 ==== Current Page: 2 ====
 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
 ==== Current Page: 3 ====
 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
 ERROR: Job failed: exit code 1

Task runs fine locally so a bit confusing. This failure also doesn't directly affect subsequent QA test runs on Stagning, more just to keep staging clean.

Edited by Kyle Wiebers