CI Artifact upload 500 error
Customer https://gitlab.my.salesforce.com/00161000004zoBW reported 500 errors on uploading CI artifacts to GitLab.com.
Upon further investigation found the 500 errors:
Correlation search (returns 3 x 500 errors: https://log.gitlab.net/goto/6e2d38cf40c1ee03ba1d55c7620bc60c)
Useragent: gitlab-runner 12.7.1 (12-7-stable; go1.13.5; linux/amd64)
The error was not posted to Sentry, but I found the following stack traces in the logs:
lib/gitlab/git/wraps_gitaly_errors.rb:13:
in `rescue in wrapped_gitaly_errors', lib/gitlab/git/wraps_gitaly_errors.rb:6:
in `wrapped_gitaly_errors', lib/gitlab/git/repository.rb:181:
in `tag_names', app/models/repository.rb:565:
in `tag_names', lib/gitlab/repository_cache_adapter.rb:48:
in `block (2 levels) in cache_method_as_redis_set', lib/gitlab/repository_set_cache.rb:54:
in `fetch', lib/gitlab/repository_cache_adapter.rb:159:
in `block in cache_method_output_as_redis_set', lib/gitlab/utils/strong_memoize.rb:30:
in `strong_memoize', lib/gitlab/repository_cache_adapter.rb:187:
in `block in memoize_method_output', lib/gitlab/repository_cache_adapter.rb:196:
in `no_repository_fallback', lib/gitlab/repository_cache_adapter.rb:186:
in `memoize_method_output', lib/gitlab/repository_cache_adapter.rb:158:
in `cache_method_output_as_redis_set', lib/gitlab/repository_cache_adapter.rb:47:
in `block in cache_method_as_redis_set', lib/gitlab/repository_cache_adapter.rb:62:
in `block in cache_method_as_redis_set', app/models/repository.rb:259:
in `tag_exists?', app/models/repository.rb:201:
in `ambiguous_ref?', app/models/project.rb:2002:
in `protected_for?', app/models/ci/pipeline.rb:621:
in `block in protected_ref?', lib/gitlab/utils/strong_memoize.rb:30:
in `strong_memoize', app/models/ci/pipeline.rb:621:
in `protected_ref?', app/models/ci/pipeline.rb:671:
in `block in predefined_commit_variables', app/models/ci/pipeline.rb:660:
in `tap', app/models/ci/pipeline.rb:660:
in `predefined_commit_variables', app/models/ci/pipeline.rb:644:
in `block in predefined_variables', app/models/ci/pipeline.rb:638:
in `tap', app/models/ci/pipeline.rb:638:
in `predefined_variables', app/models/concerns/ci/contextable.rb:16:
in `block in scoped_variables', app/models/concerns/ci/contextable.rb:13:
in `tap', app/models/concerns/ci/contextable.rb:13:
in `scoped_variables', app/models/ci/build.rb:529:
in `block in variables', lib/gitlab/utils/strong_memoize.rb:30:
in `strong_memoize', app/models/ci/build.rb:526:
in `variables', app/presenters/ci/build_runner_presenter.rb:121:
in `block in git_depth_variable', lib/gitlab/utils/strong_memoize.rb:30:
in `strong_memoize', app/presenters/ci/build_runner_presenter.rb:120:
in `git_depth_variable', app/presenters/ci/build_runner_presenter.rb:28:
in `git_depth', app/presenters/ci/build_runner_presenter.rb:40:
in `refspecs', ee/lib/gitlab/ip_address_state.rb:10:
in `with', lib/api/api_guard.rb:168:
in `call', ee/lib/omni_auth/strategies/group_saml.rb:41:
in `other_phase', ee/lib/gitlab/jira/middleware.rb:19:
in `call'
Note that this is in the same piece of code block that we've been seeing problems: https://gitlab.com/gitlab-com/gl-infra/scalability/issues/124, however this repository only has a small number of tags, so is not likely related to that issue.
What's interesting is that after the first failure, we see database errors for unique index constraints. Is it possible that the first error is leaving things in an inconsistent state that is leading to further retries failing?
ActiveRecord::RecordNotUnique: PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "index_ci_job_artifacts_on_job_id_and_file_type"
DETAIL: Key (job_id, file_type)=(420918654, 1) already exists.
: INSERT INTO "ci_job_artifacts" ("project_id", "job_id", "file_type", "size", "created_at", "updated_at", "expire_at", "file", "file_sha256", "file_format") VALUES (REDACTED, REDACTED, 1, 243, '2020-01-30 16:06:34.384602', '2020-01-30 16:06:34.384602', '2020-02-06 16:06:33.646447', 'REDACTED', 'REDACTED', 2) RETURNING "id"
lib/api/runner.rb:300:
in `block (2 levels) in <class:Runner>', ee/lib/gitlab/ip_address_state.rb:10:
in `with', lib/api/api_guard.rb:168:in `call', ee/lib/omni_auth/strategies/group_saml.rb:41:
in `other_phase', ee/lib/gitlab/jira/middleware.rb:19:
in `call'
Gitaly logs
Another concern was that this issue was not in the Gitaly logs.
This implies that the network call never made it as far as Gitaly, or that the Gitaly logs were failing.