Inconsistent 502 Whoops GitLab is taking too much time to respond in Gitlab image running on Gitlab CI

Summary

Not sure if this the proper repo to inquire about this, feel free to correct me if i should move it!

When running API calls against a GitLab instance running in the GitLab ci/cd runner, the instance occasionally throws a "502 Whoops, GitLab is taking too much time to respond" as if it is not in a ready state to accept requests.

Before I begin sending requests, I poll for a master_check = OK using the image's readiness endpoint

But certain endpoints occasionally error out as if the instance wasnt ready requests

This is pretty inconsistent which has made it hard to debug, and usually can be solved just by rerunning the CI stage. Obviously this is a horrible long term solution.

Any suggestions as to why I could be experiencing the inconsistent behaviour? Perhaps I'm looking at the wrong things to know if the instance is ready?

I would like to upload the instance logs here to help figure out the problem, but i need to look into how to cache the docker instance files in the ci file 🤔

Steps to reproduce

Unsure how to recreate consistently

Example Project

See Project https://gitlab.com/jdalrymple/gitbeaker

See cicd https://gitlab.com/jdalrymple/gitbeaker/-/blob/master/.gitlab-ci.yml

see dockerfile https://gitlab.com/jdalrymple/gitbeaker/-/blob/master/scripts/docker-compose.yml

See Job example https://gitlab.com/jdalrymple/gitbeaker/-/jobs/941506301

What is the current bug behavior?

API request returns Whoops, 'GitLab is taking too much time to respond' sometimes, when doing the exact same request.

Relevant logs

application.log

# Logfile created on 2021-01-13 22:55:43 +0000 by logger.rb/v1.4.2
2021-01-13T22:55:43.393Z: ActiveRecord connection established
2021-01-13T22:55:44.191Z: Checking state of dynamic postgres partitions
2021-01-13T22:55:57.792Z: Checking state of dynamic postgres partitions
2021-01-13T22:55:57.873Z: {:method=>"with_lock_retries", :class=>"Gitlab::Database::Partitioning::PartitionCreator", :message=>"Lock timeout is set", :current_iteration=>1, :lock_timeout_in_ms=>100}
2021-01-13T22:55:57.892Z: Created partition audit_events_part_5fc467ac26_000000 for table audit_events_part_5fc467ac26
2021-01-13T22:55:57.908Z: Created partition audit_events_part_5fc467ac26_202101 for table audit_events_part_5fc467ac26
2021-01-13T22:55:57.921Z: Created partition audit_events_part_5fc467ac26_202102 for table audit_events_part_5fc467ac26
2021-01-13T22:55:57.934Z: Created partition audit_events_part_5fc467ac26_202103 for table audit_events_part_5fc467ac26
2021-01-13T22:55:57.948Z: Created partition audit_events_part_5fc467ac26_202104 for table audit_events_part_5fc467ac26
2021-01-13T22:55:57.961Z: Created partition audit_events_part_5fc467ac26_202105 for table audit_events_part_5fc467ac26
2021-01-13T22:55:57.977Z: Created partition audit_events_part_5fc467ac26_202106 for table audit_events_part_5fc467ac26
2021-01-13T22:55:57.991Z: Created partition audit_events_part_5fc467ac26_202107 for table audit_events_part_5fc467ac26
2021-01-13T22:55:57.991Z: {:method=>"with_lock_retries", :class=>"Gitlab::Database::Partitioning::PartitionCreator", :message=>"Migration finished", :current_iteration=>1, :lock_timeout_in_ms=>100}
2021-01-13T22:55:59.095Z: User "Administrator" (admin@example.com) was created
2021-01-13T22:55:59.459Z: Group "GitLab Instance" was created
2021-01-13T22:56:00.955Z: GitLab Instance created a new project "GitLab Instance / Monitoring"
2021-01-13T22:56:02.707Z: Cannot obtain an exclusive lease. There must be another instance already in execution.
2021-01-13T22:56:30.769Z: ActiveRecord connection established
2021-01-13T22:56:32.268Z: Checking state of dynamic postgres partitions
2021-01-13T22:57:22.163Z: Checking state of dynamic postgres partitions
2021-01-13T22:57:47.911Z: Checking state of dynamic postgres partitions
2021-01-13T22:59:02.495Z: ActiveRecord connection disconnected
2021-01-13T22:59:03.662Z: ActiveRecord connection established
2021-01-13T22:59:03.728Z: ActiveRecord connection established
2021-01-13T22:59:21.921Z: ActiveRecord connection established
2021-01-13T22:59:23.609Z: Checking state of dynamic postgres partitions
2021-01-13T23:00:19.234Z: StuckCiJobsWorker: Cleaning stuck builds
2021-01-13T23:01:04.042Z: ActiveRecord connection established
2021-01-13T23:01:07.286Z: Checking state of dynamic postgres partitions
2021-01-13T23:03:11.247Z: ActiveRecord connection established
2021-01-13T23:03:16.067Z: Checking state of dynamic postgres partitions
2021-01-13T23:03:36.887Z: Administrator created a new project "Administrator / Project Upload Integration Test Text File"
2021-01-13T23:03:46.003Z: Unable to save project. Error: 4:Deadline Exceeded. Project ID: 2

exceptions_json.log

# Logfile created on 2021-01-13 22:56:02 +0000 by logger.rb/v1.4.2
{"severity":"ERROR","time":"2021-01-13T22:56:02.545Z","correlation_id":"0660f46ae56dbb90ad44611681ce683a","exception.class":"Gitlab::Git::PreReceiveError","exception.message":"Internal API unreachable","exception.backtrace":["lib/gitlab/gitaly_client/operation_service.rb:345:in `user_commit_files'","lib/gitlab/git/repository.rb:902:in `block in multi_action'","lib/gitlab/git/wraps_gitaly_errors.rb:7:in `wrapped_gitaly_errors'","lib/gitlab/git/repository.rb:901:in `multi_action'","app/models/repository.rb:854:in `block in multi_action'","app/models/repository.rb:837:in `with_cache_hooks'","app/models/repository.rb:854:in `multi_action'","app/models/repository.rb:818:in `create_file'","app/services/files/create_service.rb:16:in `create_transformed_commit'","app/services/files/create_service.rb:10:in `create_commit!'","app/services/commits/create_service.rb:30:in `execute'","app/services/projects/create_service.rb:157:in `create_readme'","app/services/projects/create_service.rb:113:in `after_create_actions'","app/services/projects/create_service.rb:63:in `block in execute'","lib/gitlab/application_context.rb:54:in `block in use'","lib/gitlab/application_context.rb:54:in `use'","lib/gitlab/application_context.rb:21:in `with_context'","app/services/projects/create_service.rb:62:in `execute'","app/services/concerns/measurable.rb:35:in `execute'","lib/gitlab/database_importers/self_monitoring/project/create_service.rb:59:in `create_project'","app/models/concerns/stepable.rb:14:in `call'","app/models/concerns/stepable.rb:14:in `block in execute_steps'","app/models/concerns/stepable.rb:13:in `each'","app/models/concerns/stepable.rb:13:in `inject'","app/models/concerns/stepable.rb:13:in `execute_steps'","lib/gitlab/database_importers/self_monitoring/project/create_service.rb:27:in `execute'","(eval):3:in `block (2 levels) in run_file'","lib/tasks/gitlab/db.rake:64:in `block (3 levels) in \u003ctop (required)\u003e'"]}

Other logs can be found here. I can add them here if there is something specific!

What is the expected correct behavior?

The request to be accepted

Possible fixes

Not exactly sure 🤔

Edited by Justin Dalrymple