Skip to content

Fix `User#user_detail` method to handle race condition

Related to #333245, !63395 (diffs, comment 594565872), and !125771 (merged)

What does this MR do and why?

During a race condition, User#user_detail raises an error:

ActiveRecord::RecordNotUnique
PG::UniqueViolation: ERROR:  duplicate key value violates unique constraint "user_details_pkey"
DETAIL:  Key (user_id)=(15130094) already exists.

I added a spec spec/models/user_spec.rb that reproduces this issue.

The first time we saw that user_detail causes this error(#333245) on CI in the following MR !63395 (merged). We added a hack to Users::UpdateService to make its call to create user_detail record for the user that is being updated if user_datils hasn't been created. That hack fixed spec failures at that time.

Currently, this error is not frequent in production but has occurred a few times.

In Allow only admins to change enterprise user pri... (!125771 - merged) we want to add a new validation to User model. This validation should apply to Enterprise Users only. Because of that, we guard that validation by if-condition. In that if-condition, we use to user_detail record, since it stores enterprise_group_id column, to figure out whether a user is an enterprise user. Using user_detail record in User's validation causes issues for some feature specs, see this pipeline https://gitlab.com/gitlab-org/gitlab/-/pipelines/967101560.

In Allow only admins to change enterprise user pri... (!125771 - merged) we found a way fix those CI failures by changing condition order for that validation


- !user.skip_enterprise_user_email_change_restrictions? && user.enterprise_user? && user.email_changed? && ::Feature.enabled?(:enterprise_users_automatic_claim, user.user_detail.enterprise_group)
+ user.email_changed? && user.enterprise_user? && !user.skip_enterprise_user_email_change_restrictions? && ::Feature.enabled?(:enterprise_users_automatic_claim, user.user_detail.enterprise_group)

This change makes sense from a performance perspective since user.email_changed? method call is less expensive than user.enterprise_user?, which loads the related user_detail record. In that way, user_detail from the validation is not called in the context of those feature specs. It prevents those spec failures because it reduces the probability of the race condition for the user_detail method to its current minimum.

Increasing the frequency of use of the user_detail method increases the probability of the race condition for that method that leads to the error.

This MR

  • removes the hack that was added in !63395 (diffs, comment 594565872)
  • adds a new spec to user_detail method directly reprocude the error
  • fixes the method to handle the race condition
  • fixes N+1 specs failures caused by the changes in the method
  • fixes specs that use build_stubbed method to build a user record

The difference this method change causes compare to the existing user_detail implementation is that:

  • it does not work with build_stubbed(:user).user_detail the same way. Not a concern IMO since it is test-specific only and there are lots of ways to handle this issue: by explicitly setting user_detail for build_stubbed(:user) or by using build(:user)/create(:user) instead. If that method weren't overridden, build_stubbed(:user).user_detail would return nil by default.

  • If user_detail record hasn't been created for the user, the following code user.update!(bio: 'hello') will perform +1 SQL query instead of one

  UserDetail Create (0.6ms)  INSERT INTO "user_details" ("user_id") VALUES (1220) RETURNING "user_id"
  UserDetail Update (0.9ms)  UPDATE "user_details" SET "bio" = 'hello' WHERE "user_details"."user_id" = 1220 

# vs

  UserDetail Create (0.6ms)  INSERT INTO "user_details" ("user_id", "bio") VALUES (1221, 'hello') RETURNING "user_id"

IMO this is also not a concern at all from performance perspective since the user_detail method does this once per user, and after that, each method call behaves the same way (super.presence).

Note that User#user_preference method causes the same error. It also has occurred in production few times. I think it should be fixed in the same way. But let's do it by a separate MR.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Bogdan Denkovych

Merge request reports