Investigate 502s from git push with push rules on Staging, Master, Production
Several tests in the suite /qa/specs/features/ee/browser_ui/3_create/repository/push_rules_spec.rb
are failing with 502 (bad gateway) errors on Staging.
This only happens on Staging, not the other environments.
To quote @niskhakova investigation findings from the test failure ticket gitlab-org/gitlab#273007 (closed)
- User tries to push commit to the branch on the project with push rules enabled
- The request is failing like this:
Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Delta compression using up to 16 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 934 bytes | 934.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
remote: GitLab: Internal API error (502)
To https://staging.gitlab.com/gitlab-qa-sandbox-group/qa-test-2020-10-30-15-41-17-95733ab4cb6919d6/push_rules-a408d4f04c4953b1.git
! [remote rejected] master-test -> master-test (pre-receive hook declined)
error: failed to push some refs to 'https://staging.gitlab.com/gitlab-qa-sandbox-group/qa-test-2020-10-30-15-41-17-95733ab4cb6919d6/push_rules-a408d4f04c4953b1.git'
- Looks like the failure happens on
Repositories::GitHttpController#git_receive_pack
call - example correlation_id -0YaOFnXItD5
- Then with this correlation ID we see several calls in Gitaly and API request to
https://int.gstg.gitlab.net:11443/api/v4/internal/allowed
is failing with502 error
- The error appears on each git push in the project with push rules enabled. Specifically used
/qa/specs/features/ee/browser_ui/3_create/repository/push_rules_spec.rb:71 - restricts commit by message format
spec to reproduce.
All 502 errors on Staging Gitaly nodes for the last 15 hours: https://nonprod-log.gitlab.net/app/kibana#/discover?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-15h,to:now))&_a=(columns:!(json.error,json.level,json.msg,host.name,json.fqdn,json.url,json.correlation_id),filters:!(),index:pubsub-gitaly-inf-gstg,interval:auto,query:(language:kuery,query:'json.status%20:%20502'),sort:!())
The Gitaly team suggested that this looked more like a Rails issue than a Gitaly node issue.
Examples of recent job failures (which show up in logs as failed git push commands)
https://ops.gitlab.net/gitlab-org/quality/staging/-/jobs/2217746
https://ops.gitlab.net/gitlab-org/quality/staging/-/jobs/2217598
https://ops.gitlab.net/gitlab-org/quality/staging/-/jobs/2217222
Examples of successful runs on other environments:
Production success - https://ops.gitlab.net/gitlab-org/quality/production/-/jobs/2217169
Nightly success (as part of ce:upgrade job) - https://gitlab.com/gitlab-org/quality/nightly/-/jobs/820542451