Project import race condition results in 500 when importing a project with same name&path concurrently
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Summary
When importing two projects concurrently, into the same name&path, we could end up with a half-baked state in the requests, resulting in a 500 error with (full stacktrace further down in this issue):
"exception.class": "NoMethodError",
"exception.message": "undefined method `type' for #<ProjectAuthorization:0x00007fa885b76378>",
We ran into this when importing (the same) project from GitHub, but I believe any regular import could cause this, since it uses the ProjectService.
Steps to reproduce
Trigger a project import concurrently, in a loop, for example:
# fish script
for i in (seq 1 15)
curl --request POST --url "$DOMAIN/api/v4/import/github" --header "content-type: application/json" --header "PRIVATE-TOKEN: $PAT" --data '{"personal_access_token": "'$GH_PAT'", "repo_id": '$PROJ1', "target_namespace": "'$GROUP'", "new_name": "name'$i'"}' &
curl --request POST --url "$DOMAIN/api/v4/import/github" --header "content-type: application/json" --header "PRIVATE-TOKEN: $PAT" --data '{"personal_access_token": "'$GH_PAT'", "repo_id": '$PROJ2', "target_namespace": "'$GROUP'", "new_name": "name'$i'"}' &
end
What is the current bug behavior?
Some requests to the API return a 500 because of the race condition.
What is the expected correct behavior?
No error, or properly handling concurrent requests and returning a "name/path was already taken" as expected.
Relevant logs and/or screenshots
Tested version: 13.5.1-ee
"exception.class": "NoMethodError",
"exception.message": "undefined method `type' for #<ProjectAuthorization:0x00007fa885b76378>",
"exception.backtrace": [
"app/services/projects/create_service.rb:70:in `rescue in execute'",
"app/services/projects/create_service.rb:15:in `execute'",
"ee/app/services/ee/projects/create_service.rb:17:in `execute'",
"app/services/concerns/measurable.rb:35:in `execute'",
"lib/gitlab/legacy_github_import/project_creator.rb:30:in `execute'",
"app/services/import/github_service.rb:32:in `create_project'",
"lib/gitlab/metrics/instrumentation.rb:160:in `block in create_project'",
"lib/gitlab/metrics/method_call.rb:27:in `measure'",
"lib/gitlab/metrics/instrumentation.rb:160:in `create_project'",
"app/services/import/github_service.rb:13:in `execute'",
"lib/gitlab/metrics/instrumentation.rb:160:in `block in execute'",
"lib/gitlab/metrics/method_call.rb:27:in `measure'",
"lib/gitlab/metrics/instrumentation.rb:160:in `execute'",
"lib/api/import_github.rb:48:in `block in <class:ImportGithub>'",
"ee/lib/gitlab/middleware/ip_restrictor.rb:14:in `block in call'",
"ee/lib/gitlab/ip_address_state.rb:10:in `with'",
"ee/lib/gitlab/middleware/ip_restrictor.rb:13:in `call'",
"lib/api/api_guard.rb:208:in `call'",
"lib/gitlab/metrics/elasticsearch_rack_middleware.rb:16:in `call'",
"lib/gitlab/middleware/rails_queue_duration.rb:33:in `call'",
"lib/gitlab/metrics/rack_middleware.rb:16:in `block in call'",
"lib/gitlab/metrics/transaction.rb:61:in `run'",
"lib/gitlab/metrics/rack_middleware.rb:16:in `call'",
"lib/gitlab/request_profiler/middleware.rb:17:in `call'",
"lib/gitlab/jira/middleware.rb:19:in `call'",
"lib/gitlab/middleware/go.rb:20:in `call'",
"lib/gitlab/etag_caching/middleware.rb:13:in `call'",
"lib/gitlab/middleware/multipart.rb:218:in `call'",
"lib/gitlab/middleware/handle_null_bytes.rb:19:in `call'",
"lib/gitlab/middleware/read_only/controller.rb:51:in `call'",
"lib/gitlab/middleware/read_only.rb:18:in `call'",
"lib/gitlab/middleware/same_site_cookies.rb:27:in `call'",
"lib/gitlab/middleware/basic_health_check.rb:25:in `call'",
"lib/gitlab/middleware/handle_ip_spoof_attack_error.rb:25:in `call'",
"lib/gitlab/middleware/request_context.rb:23:in `call'",
"config/initializers/fix_local_cache_middleware.rb:9:in `call'",
"lib/gitlab/metrics/requests_rack_middleware.rb:49:in `call'",
"lib/gitlab/middleware/release_env.rb:12:in `call'"
],
Possible fixes
There are two parts to this, one that throws the 500 error is because the ProjectAuthorization record that is invalid has no type
, so the rescue is the one throwing the error.
Then, there are other "effects", of not handling this in a concurrent-safe way, for example application_json.log
shows:
{"severity":"ERROR","time":"2020-10-29T21:12:20.949Z","correlation_id":"yuUyoNGyDx4","message":"Unable to save project. Error: Name has already been taken, Path has already been taken"}
{"severity":"ERROR","time":"2020-10-29T21:12:20.959Z","correlation_id":"DxtpuAbe5i5","message":"Unable to save project. Error: PG::UniqueViolation: ERROR: duplicate key value violates unique constraint \"index_routes_on_path\"\nDETAIL: Key (path)=(group1/name15) already exists.\n"}