Inconsistent 502 Whoops GitLab is taking too much time to respond in Gitlab image running on Gitlab CI
Summary
Not sure if this the proper repo to inquire about this, feel free to correct me if i should move it!
When running API calls against a GitLab instance running in the GitLab ci/cd runner, the instance occasionally throws a "502 Whoops, GitLab is taking too much time to respond" as if it is not in a ready state to accept requests.
Before I begin sending requests, I poll for a master_check = OK using the image's readiness endpoint
But certain endpoints occasionally error out as if the instance wasnt ready requests
This is pretty inconsistent which has made it hard to debug, and usually can be solved just by rerunning the CI stage. Obviously this is a horrible long term solution.
Any suggestions as to why I could be experiencing the inconsistent behaviour? Perhaps I'm looking at the wrong things to know if the instance is ready?
I would like to upload the instance logs here to help figure out the problem, but i need to look into how to cache the docker instance files in the ci file
Steps to reproduce
Unsure how to recreate consistently
Example Project
See Project https://gitlab.com/jdalrymple/gitbeaker
See cicd https://gitlab.com/jdalrymple/gitbeaker/-/blob/master/.gitlab-ci.yml
see dockerfile https://gitlab.com/jdalrymple/gitbeaker/-/blob/master/scripts/docker-compose.yml
See Job example https://gitlab.com/jdalrymple/gitbeaker/-/jobs/941506301
What is the current bug behavior?
API request returns Whoops, 'GitLab is taking too much time to respond' sometimes, when doing the exact same request.
Relevant logs
application.log
# Logfile created on 2021-01-13 22:55:43 +0000 by logger.rb/v1.4.2
2021-01-13T22:55:43.393Z: ActiveRecord connection established
2021-01-13T22:55:44.191Z: Checking state of dynamic postgres partitions
2021-01-13T22:55:57.792Z: Checking state of dynamic postgres partitions
2021-01-13T22:55:57.873Z: {:method=>"with_lock_retries", :class=>"Gitlab::Database::Partitioning::PartitionCreator", :message=>"Lock timeout is set", :current_iteration=>1, :lock_timeout_in_ms=>100}
2021-01-13T22:55:57.892Z: Created partition audit_events_part_5fc467ac26_000000 for table audit_events_part_5fc467ac26
2021-01-13T22:55:57.908Z: Created partition audit_events_part_5fc467ac26_202101 for table audit_events_part_5fc467ac26
2021-01-13T22:55:57.921Z: Created partition audit_events_part_5fc467ac26_202102 for table audit_events_part_5fc467ac26
2021-01-13T22:55:57.934Z: Created partition audit_events_part_5fc467ac26_202103 for table audit_events_part_5fc467ac26
2021-01-13T22:55:57.948Z: Created partition audit_events_part_5fc467ac26_202104 for table audit_events_part_5fc467ac26
2021-01-13T22:55:57.961Z: Created partition audit_events_part_5fc467ac26_202105 for table audit_events_part_5fc467ac26
2021-01-13T22:55:57.977Z: Created partition audit_events_part_5fc467ac26_202106 for table audit_events_part_5fc467ac26
2021-01-13T22:55:57.991Z: Created partition audit_events_part_5fc467ac26_202107 for table audit_events_part_5fc467ac26
2021-01-13T22:55:57.991Z: {:method=>"with_lock_retries", :class=>"Gitlab::Database::Partitioning::PartitionCreator", :message=>"Migration finished", :current_iteration=>1, :lock_timeout_in_ms=>100}
2021-01-13T22:55:59.095Z: User "Administrator" (admin@example.com) was created
2021-01-13T22:55:59.459Z: Group "GitLab Instance" was created
2021-01-13T22:56:00.955Z: GitLab Instance created a new project "GitLab Instance / Monitoring"
2021-01-13T22:56:02.707Z: Cannot obtain an exclusive lease. There must be another instance already in execution.
2021-01-13T22:56:30.769Z: ActiveRecord connection established
2021-01-13T22:56:32.268Z: Checking state of dynamic postgres partitions
2021-01-13T22:57:22.163Z: Checking state of dynamic postgres partitions
2021-01-13T22:57:47.911Z: Checking state of dynamic postgres partitions
2021-01-13T22:59:02.495Z: ActiveRecord connection disconnected
2021-01-13T22:59:03.662Z: ActiveRecord connection established
2021-01-13T22:59:03.728Z: ActiveRecord connection established
2021-01-13T22:59:21.921Z: ActiveRecord connection established
2021-01-13T22:59:23.609Z: Checking state of dynamic postgres partitions
2021-01-13T23:00:19.234Z: StuckCiJobsWorker: Cleaning stuck builds
2021-01-13T23:01:04.042Z: ActiveRecord connection established
2021-01-13T23:01:07.286Z: Checking state of dynamic postgres partitions
2021-01-13T23:03:11.247Z: ActiveRecord connection established
2021-01-13T23:03:16.067Z: Checking state of dynamic postgres partitions
2021-01-13T23:03:36.887Z: Administrator created a new project "Administrator / Project Upload Integration Test Text File"
2021-01-13T23:03:46.003Z: Unable to save project. Error: 4:Deadline Exceeded. Project ID: 2
exceptions_json.log
# Logfile created on 2021-01-13 22:56:02 +0000 by logger.rb/v1.4.2
{"severity":"ERROR","time":"2021-01-13T22:56:02.545Z","correlation_id":"0660f46ae56dbb90ad44611681ce683a","exception.class":"Gitlab::Git::PreReceiveError","exception.message":"Internal API unreachable","exception.backtrace":["lib/gitlab/gitaly_client/operation_service.rb:345:in `user_commit_files'","lib/gitlab/git/repository.rb:902:in `block in multi_action'","lib/gitlab/git/wraps_gitaly_errors.rb:7:in `wrapped_gitaly_errors'","lib/gitlab/git/repository.rb:901:in `multi_action'","app/models/repository.rb:854:in `block in multi_action'","app/models/repository.rb:837:in `with_cache_hooks'","app/models/repository.rb:854:in `multi_action'","app/models/repository.rb:818:in `create_file'","app/services/files/create_service.rb:16:in `create_transformed_commit'","app/services/files/create_service.rb:10:in `create_commit!'","app/services/commits/create_service.rb:30:in `execute'","app/services/projects/create_service.rb:157:in `create_readme'","app/services/projects/create_service.rb:113:in `after_create_actions'","app/services/projects/create_service.rb:63:in `block in execute'","lib/gitlab/application_context.rb:54:in `block in use'","lib/gitlab/application_context.rb:54:in `use'","lib/gitlab/application_context.rb:21:in `with_context'","app/services/projects/create_service.rb:62:in `execute'","app/services/concerns/measurable.rb:35:in `execute'","lib/gitlab/database_importers/self_monitoring/project/create_service.rb:59:in `create_project'","app/models/concerns/stepable.rb:14:in `call'","app/models/concerns/stepable.rb:14:in `block in execute_steps'","app/models/concerns/stepable.rb:13:in `each'","app/models/concerns/stepable.rb:13:in `inject'","app/models/concerns/stepable.rb:13:in `execute_steps'","lib/gitlab/database_importers/self_monitoring/project/create_service.rb:27:in `execute'","(eval):3:in `block (2 levels) in run_file'","lib/tasks/gitlab/db.rake:64:in `block (3 levels) in \u003ctop (required)\u003e'"]}
Other logs can be found here. I can add them here if there is something specific!
What is the expected correct behavior?
The request to be accepted
Possible fixes
Not exactly sure