Skip to content
Snippets Groups Projects

Integrate improved user mapping with Direct Transfer

28 unresolved threads

What does this MR do and why?

Related to: #443557 (closed)

Update Direct Transfer to map contributions to placeholder users for every existing member in the source instance. Contributions created by non-members are assigned to the Import User.

Note

Follow-up MRs will be created to implement some missing features:

  1. Handle mentions in notes and description
  2. Check if all references were saved before marking the migration as finished
  3. Retry the pipeline in case of a Gitlab::Import::SourceUserMapper::FailedToObtainLockError error
  4. Populate source user names for non-members
  5. Cache source user details in Redis to reduce database queries

How it works

In summary, the MR changes the MembersPipeline and the NdJsonPipelines. The MembersPipeline pipeline creates an Import::SourceUser and a placeholder user for every member in the source instance. The NdJsonPipelines pipeline creates an Import::SourceUser for every user ID contained in the relation hash that was not created in the MembersPipeline, that is, for all user IDs that aren't members in the source instance. Then, the NdJsonPipelines updates the user IDs found in the relation hash with the corresponding user ID mapped to the Import::SourceUser.

See a more detailed explanation below:

MembersPipeline

When the MembersPipeline is executed, an Import::SourceUser and a placeholder user are created for each member in the source instance. The Import::SourceUser records are created with the source_user_identifier set to the user ID of the source instance. Furthermore, the source_name, source_username, and source_hostname fields are populated with the corresponding information from the source instance.

Placeholder users are added as members of the group/project, and their access level is set to match that of the source instance. If an existing Import::SourceUser record already exists for the source_user_identifier, a placeholder user is not created, and instead, the mapped user (which can be a placeholder or real user) is added as a member. It's important to note that if the user is already a member, a new member is not added, and the access level of the existing member is preserved. Also, a member is not added if the placeholder user is of the type ImportUser.

NdJsonPipelines

When the NdJsonPipeline#transform method is executed, it searches for an Import::SourceUser with a matching source_user_identifier for every user ID in the relation hash. If none is found, a new Import::SourceUser record is created and the source_user_identifier is set to the same user ID. All created Import::SourceUser are mapped to the same ImportUser, which means all contributions for non-members will be assigned to the ImportUser. Note that currently, the source_name and source_username for the created Import::SourceUser records are set to nil. A subsequent MR will introduce a new pipeline to fetch the information from the source instance and populate the missing details.

For instance, consider a hash-like relation:

{ iid: 1, author_id: 101, merged_by: 102, title: 'MR', notes: [{ note: 'Note 1', author_id: 102 }, { note: 'Note 2', author_id 103 } }] }

If none of the user IDs is found, NdJsonPipeline#transform could create an Import::SourceUser with the source_user_identifies 101, 102, and 103 and the source user would be mapped to the ImportUser.

After creating all missing Import::SourceUser, RelationFactory is executed and the new SourceUsersMapper is provided. This mapper returns the corresponding user in the destination mapped to the source_user_identifier.

For example, SourceUsersMapper#map[101] returns the user ID of the user mapped to an Import::SourceUser with the source_user_identifier 101. If the Import::SourceUser was reassigned to a real user, SourceUsersMapper#map[101] will return the user ID of the real user. Otherwise, it returns the user ID of a placeholder user.

Therefore, RelationFactory will build the relation object will all user IDs set to the appropriate user in the destination. Besides, for every built object, the attribute source_user_references will hold a map of the source user IDs, which later is used to push the placeholder user references.

For instance, consider a hash-like relation:

{ iid: 1, author_id: 101, merged_by: 102, title: 'MR', notes: [{ note: 'Note 1', author_id: 102 }, { note: 'Note 2', author_id 103 } }] }

RelationFactory will return an object like:

MergeRequest.new(
  title: 'MR'
  author_id: 1
  merged_by: 2,
  source_user_references: { 'author_id' => 101, 'merged_by' => 102 },
  notes: [
    Note.new(
      note: 'Note 1',
      author_id: 2,
      source_user_references: { 'author_id' => 102 }
    ),
    Note.new(
      note: 'Note 2',
      author_id: 3,
      source_user_references: { 'author_id' => 103 }
    )
  ]
)

Later, it is persisted by ObjectBuilder as before in the NdJsonPipeline#load method.

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

placeholder

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

  1. Enable the feature flags: importer_user_mapping and bulk_import_importer_user_mapping
    Feature.enable(:member_areas_of_focus)
    Feature.enable(:bulk_import_importer_user_mapping)
  2. New group, import group
  3. Provide a host user and access token
  4. Select the group to be imported
  5. Check if placeholder users were created and contributions assigned to them
Edited by Rodrigo Tomonari

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
85 85 )
86 86 bulk_import.create_configuration!(credentials.slice(:url, :access_token))
87 87
88 if Feature.enabled?(:importer_user_mapping, current_user)
89 ::Import::BulkImports::EphemeralData.new(bulk_import.id).enable_importer_user_mapping
90 end
  • :tools: Generated by gitlab_quality-test_tooling.


    :snail: Slow tests detected in this merge request. These slow tests might be related to this merge request's changes.

    Click to expand
    Job File Name Duration Expected duration
    #7422352764 ee/spec/models/epic_spec.rb#L1030 Epic relative positioning there is a parent behaves like a class that supports relative positioning .move_nulls_to_end can move many nulls 52.77 s < 45.4 s
    #7432864282 ee/spec/models/epic_spec.rb#L1030 Epic relative positioning there is a parent behaves like a class that supports relative positioning .move_nulls_to_end can move many nulls 45.47 s < 45.4 s
    #7441716224 ee/spec/models/epic_spec.rb#L1030 Epic relative positioning there is a parent behaves like a class that supports relative positioning .move_nulls_to_end can move many nulls 46.56 s < 45.4 s
  • A deleted user added rspec:slow test detected label
  • Rodrigo Tomonari changed milestone to %17.3

    changed milestone to %17.3

  • mentioned in issue #443557 (closed)

  • 12 12 end
    13 13
    14 def find_or_create_internal_user(source_name:, source_username:, source_user_identifier:)
    15 @source_name = source_name
    16 @source_username = source_username
    17 @source_user_identifier = source_user_identifier
    14 def find_source_user(source_user_identifier)
    15 ::Import::SourceUser.find_source_user(
    16 source_user_identifier: source_user_identifier,
    17 namespace: namespace,
    18 source_hostname: source_hostname,
    19 import_type: import_type
    20 )
    21 end
    22
    23 def find_or_create_source_user(source_name:, source_username:, source_user_identifier:, use_import_user: false)
  • Rodrigo Tomonari added 901 commits

    added 901 commits

    Compare with previous version

  • A deleted user added feature flag label

    added feature flag label

  • E2E Test Result Summary

    allure-report-publisher generated test report!

    e2e-test-on-gdk: :white_check_mark: test report for 00c5465a

    expand test summary
    +------------------------------------------------------------------+
    |                          suites summary                          |
    +-------------+--------+--------+---------+-------+-------+--------+
    |             | passed | failed | skipped | flaky | total | result |
    +-------------+--------+--------+---------+-------+-------+--------+
    | Release     | 5      | 0      | 0       | 0     | 5     | ✅     |
    | Verify      | 44     | 0      | 2       | 0     | 46    | ✅     |
    | Create      | 127    | 0      | 12      | 0     | 139   | ✅     |
    | Govern      | 71     | 0      | 0       | 0     | 71    | ✅     |
    | Plan        | 70     | 0      | 0       | 0     | 70    | ✅     |
    | Data Stores | 31     | 0      | 1       | 0     | 32    | ✅     |
    | Manage      | 0      | 0      | 2       | 0     | 2     | ➖     |
    | Package     | 16     | 0      | 15      | 0     | 31    | ✅     |
    | Analytics   | 2      | 0      | 0       | 0     | 2     | ✅     |
    | Monitor     | 8      | 0      | 0       | 0     | 8     | ✅     |
    | Fulfillment | 2      | 0      | 0       | 0     | 2     | ✅     |
    | Secure      | 2      | 0      | 0       | 0     | 2     | ✅     |
    +-------------+--------+--------+---------+-------+-------+--------+
    | Total       | 378    | 0      | 32      | 0     | 410   | ✅     |
    +-------------+--------+--------+---------+-------+-------+--------+

    e2e-package-and-test: :white_check_mark: test report for 00c5465a

    expand test summary
    +-------------------------------------------------------------+
    |                       suites summary                        |
    +--------+--------+--------+---------+-------+-------+--------+
    |        | passed | failed | skipped | flaky | total | result |
    +--------+--------+--------+---------+-------+-------+--------+
    | Manage | 6      | 0      | 14      | 2     | 20    | ✅     |
    +--------+--------+--------+---------+-------+-------+--------+
    | Total  | 6      | 0      | 14      | 2     | 20    | ✅     |
    +--------+--------+--------+---------+-------+-------+--------+
  • added 1 commit

    Compare with previous version

  • added 1 commit

    Compare with previous version

  • added 1 commit

    Compare with previous version

  • added 1 commit

    • da4a0dc5 - Refactor relation factory and add specs

    Compare with previous version

  • Ghost User
  • 164 # Creates an Import::SourceUser objects for each source_user_identifier
    165 # found in the relation_hash and associate it with the ImportUser.
    166 def create_import_source_users(_relation_key, relation_hash)
    167 relation_factory::USER_REFERENCES.each do |reference|
    168 next unless relation_hash[reference]
    169
    170 source_user_mapper.find_or_create_source_user(
    171 source_name: nil,
    172 source_username: nil,
    173 source_user_identifier: relation_hash[reference],
    174 use_import_user: true
    175 )
    176 end
    177 end
    178
    179 # rubocop:disable GitlabSecurity/PublicSend -- only methods in the relation_definition are called
  • Ghost User
  • Martin Wortschack mentioned in merge request !160763 (merged)

    mentioned in merge request !160763 (merged)

  • added 1 commit

    • f06a85e8 - Refactor relation factory and add specs

    Compare with previous version

  • Rodrigo Tomonari changed the description

    changed the description

  • Rodrigo Tomonari changed the description

    changed the description

  • added 1 commit

    • 8fcf0fc6 - Do not push references for real users

    Compare with previous version

  • added 1 commit

    Compare with previous version

  • 4 4 module PlaceholderReferences
    5 5 class PushService < BaseService
    6 6 class << self
    7 def from_record(import_source:, import_uid:, source_user:, record:, user_reference_column:, composite_key: nil)
    8 numeric_key = record.id if composite_key.nil? && record.id.is_a?(Integer)
    7 def from_record(import_source:, import_uid:, source_user:, record:, user_reference_column:)
    8 if record.is_a?(IssueAssignee)
    9 composite_key = {
    10 'issue_id' => record.issue_id,
    11 'user_id' => record.user_id
    12 }
    • Comment on lines +8 to +12

      I believe it's more beneficial to handle the composition key here since it will be the same for all the other importers.

    • non-blocking: Is it exclusive to IssueAssignee? Perhaps we should have a new class that returns a composite key depending on the object you're passing, to accommodate for potential future extension, if we need to add a new type of composite key for another class? e.g. PlaceholderReferenceCompositeKey.new(record).execute

    • Is it exclusive to IssueAssignee?

      I checked all models exported by Import/Export, and only IssueAssignee needs to be handled differently.

      Other models like ApprovalProjectRulesProtectedBranch also use a composite primary key, and I believe Import/Export exports them, but the other models don't reference a user, which means we won't create a placeholder reference for them.

      Perhaps we should have a new class that returns a composite key depending on the object you're passing

      I considered moving the logic to a different place, but since IssueAssignee is the only exception and we probably won't need to extend this, I will keep it like this for now.

    • Rodrigo Tomonari changed this line in version 10 of the diff

      changed this line in version 10 of the diff

    • Please register or sign in to reply
  • requested review from @georgekoltsov

  • Rodrigo Tomonari marked this merge request as ready

    marked this merge request as ready

  • 1 # frozen_string_literal: true
    2
    3 module Import
    4 module HasSourceUserReferences
    5 extend ActiveSupport::Concern
    6
    7 attr_accessor :source_user_references
    • It isn't the most elegant solution, but I needed a temporary place to store the source_user_identifies for later use.

      Other importers will probably not use this, and it will only be needed for Direct Transfer, which splits the transform and load phase into different steps.

      Another alternative would be to wrap or create a different structure for the objects built by RelationFactory, but that would require too many changes.

    • @rodrigo.tomonari With this approach, if we want to support user mapping for a new kind of object in Direct Transfer, would we require that the model has this module mixed in? We've tried to make user mapping work with new relations out of the box. I wonder where else we could put this data? Would something like Gitlab::SafeRequestStore work?

    • Yeah, I'm also concerned about someone forgetting to include the module. An alternative would be to have a spec to make sure the module was included in all models touched by DT.

      I will try another alternative which doesn't require too many changes to the RelationFactory

    • Please register or sign in to reply
  • 4 4 module PlaceholderReferences
    5 5 class PushService < BaseService
    6 6 class << self
    7 def from_record(import_source:, import_uid:, source_user:, record:, user_reference_column:, composite_key: nil)
    8 numeric_key = record.id if composite_key.nil? && record.id.is_a?(Integer)
    7 def from_record(import_source:, import_uid:, source_user:, record:, user_reference_column:)
    8 if record.is_a?(IssueAssignee)
    9 composite_key = {
    10 'issue_id' => record.issue_id,
    11 'user_id' => record.user_id
    12 }
    13 else
    14 numeric_key = record.id
    • The removed line numeric_key = record.id if composite_key.nil? && record.id.is_a?(Integer) above implied that there are cases where composite_key is nil and record id is not an integer. With this change, can that still be the case? :thinking:

      Also, if record is IssueAssignee then the numeric key is going to raise undefined method error, right?

    • Good call. I don't think there is a case where record.id is a string. Also, the specs don't have a test for this case, so it should be fine.

      Just to be sure, let me ping @.luke which implemented the logic.

      Also, if record is IssueAssignee then the numeric key is going to raise undefined method error, right?

      I also thought it would raise an error. But it doesn't raise.

      For example, the code below doesn't raise an error:

      class Test
        def test
          if true
            a = 1
          else
            b = 2
          end
      
          a
          b
        end
      end
      
      Test.new.test
      
      => nil

      Anyway, to be on the safe side, I update the code to set the variable to nil :smile:

    • @rodrigo.tomonari I think what I meant by testing .id.is_a?(Integer) was to only assign a numeric_key value from the record if the record didn't have a composite primary key. Records with a normal primary key would return an Integer from the #id call even if the primary key was a different column name, but records with a composite primary key would return nil.

      When I was writing that I was thinking that the calling code would pass a composite_key: value if there should be one. But I think the way you've change this to have PushService handle it like this looks better.

      If there is another record in future that has a compositie primary key and isn't IssueAssignee, what would happen is both composite_key and numeric_key would be nil, and the service would return an error response.

      :thinking: It makes me wonder whether here we should log when the reference is not valid, possibly raise an error in development, because it represents data loss. Otherwise we're expecting the places that call this service to do something similar.

    • Please register or sign in to reply
  • 26 28
    27 29 private
    28 30
    31 def source_user_transformer(context, data)
  • 85 87 end
    86 88 end
    87 89
    90 # Create missing Import::SourceUser objects during the transformation step, as no data
    91 # related to the relation has been persisted yet.
    92 create_import_source_users(relation_key, relation_hash) if context.importer_user_mapping_enabled?
    • non-blocking: It sounds like this step can be encapsulated into a separate class in order to be used by other importers? We have a hash of data that we need to find user references in and create source users from it? Do I understand it right?

    • I don't think it can be used. Third-party importers use a different structure for the data hash.

      For example, GitHub API returns users as objects, like the example below. So the other importers will have to implement their own method to extract the data.

      {
        "number": 1,
        "title": "Pull request", 
        "author": {
          "id": 101,
          "login": "octocat"
        }
      }  
    • Please register or sign in to reply
  • 29 29 # User is already a member with higher existing (inherited) membership
    30 30 return if user_membership && user_membership[:access_level] >= data[:access_level]
    31 31
    32 source_source = data.delete(:source_user)
    33
    32 34 # Create new membership for any other access level
    33 35 member = portable.members.new(data)
    34 36 member.importing = true # avoid sending new member notification to the invited user
    35 37 member.save!
    38
    39 push_placeholder_reference(source_source, member) if context.importer_user_mapping_enabled?
    40
    41 member
    • Does creation of placeholder user membership contribute to any limits of any kinds? E.g. Max group membership count, or license seating? We should probably double check to make sure it doesn't affect these numbers.

    • Good point.

      There is still a discussion about whether placeholder users should be added as members, and we probably won't add them as members, so I will update the MR not to add them as members.

    • It seems like placeholder users are being counted as billable users :disappointed:

      I guess we need to add the ImportUser and PlacehoderUser types to the list of bots.

      I will create a separate issue to review that

    • Rodrigo Tomonari created #476606 (closed) to continue this discussion

      created #476606 (closed) to continue this discussion

    • I decided to keep importing placeholder users as members, since we are still deciding which approach we will take. So, in the future MR we can implement the agreed approach.

    • Placeholder users are a kind of internal user, which we document as not contributing to the "license limit". We mention here that billable users do not include "GitLab-created service account" which might mean internal users.

    • It seems like placeholder users are being counted as billable users :disappointed:

      @rodrigo.tomonari are they counted towards billable users when created, without adding as member of the group? I worked on importer user and creating importer user doesn't mark it as billable, but not sure if it becomes billable once added to the group. My main concern is not self-managed but .com where we bill groups instead of instances. So if placeholder/import users count towards billable users on .com then we need to exclude them. I imagine bot users are not billable but appear in the members list, so maybe we can do something similar in our case.

    • @georgekoltsov, from what I investigated, they are being counted as billable users when added as members.

      For example, groups can use the endpoint billable_members API to check the list of members that are billable and placeholder users appear on the list if they are added as members.

      Marking the user types as bots fixes the problem.

      I created #476606 (closed) to handle the problem

    • Please register or sign in to reply
  • 155
    156 def source_user_mapper
    157 @source_user_mapper ||= Gitlab::Import::SourceUserMapper.new(
    158 namespace: context.portable.root_ancestor,
    159 import_type: Import::SOURCE_DIRECT_TRANSFER,
    160 source_hostname: context.configuration.source_hostname
    161 )
    162 end
    163
    164 # Creates an Import::SourceUser objects for each source_user_identifier
    165 # found in the relation_hash and associate it with the ImportUser.
    166 def create_import_source_users(_relation_key, relation_hash)
    167 relation_factory::USER_REFERENCES.each do |reference|
    168 next unless relation_hash[reference]
    169
    170 source_user_mapper.find_or_create_source_user(
    • non-blocking: is this step needed in case when found user isn't a member of the source group/project? If so, I wonder if it makes sense to have placeholder user creation as a separate pipeline before we start importing any data? As far as I understood we started to export user contributions ndjson file where all of the users contributions should be available, but we don't seem to be using them, right? Wonder if a separate stage pipeline can be a cleaner solution?

    • Yeah, a separate pipeline/stage would be a better approach, however, there are two problems with the new user_contributions relation, which is why I'm not using it:

      The first problem is that the relation is only exported after all other relations have finished, which means the migration would be stuck in this stage until everything is exported.

      The second problem is that the relation is unavailable for previous GitLab versions, and ideally, the feature should work for all versions.

      FYI: I'm planning to use the new relation file to fill in the source_username and source_name which are being set to nil here.

    • The first problem is that the relation is only exported after all other relations have finished, which means the migration would be stuck in this stage until everything is exported.

      Alteration to our import process that can help with the import to not be 'so stuck' is to update our stages to import binaries at the very beginning:

      1. Repository
      2. Uploads
      3. Snippets
      4. Wiki
      5. LFS
      6. And only then continue waiting for user contributions to be available

      Just an idea

      The second problem is that the relation is unavailable for previous GitLab versions, and ideally, the feature should work for all versions.

      Good point we still need to have source users creation during ndjson pipeline to fall back on

    • Please register or sign in to reply
  • George Koltsov removed review request for @georgekoltsov

    removed review request for @georgekoltsov

  • added 2 commits

    • 6b2a7a2f - Do not create references for system_note_meta
    • 013f222b - Code review feedback

    Compare with previous version

  • Ghost User
  • 164 # Creates an Import::SourceUser objects for each source_user_identifier
    165 # found in the relation_hash and associate it with the ImportUser.
    166 def create_import_source_users(_relation_key, relation_hash)
    167 relation_factory::USER_REFERENCES.each do |reference|
    168 next unless relation_hash[reference]
    169
    170 source_user_mapper.find_or_create_source_user(
    171 source_name: nil,
    172 source_username: nil,
    173 source_user_identifier: relation_hash[reference],
    174 use_import_user: true
    175 )
    176 end
    177 end
    178
    179 # rubocop:disable GitlabSecurity/PublicSend -- only methods in the relation_definition are called
  • Rodrigo Tomonari mentioned in merge request !159648 (merged)

    mentioned in merge request !159648 (merged)

  • mentioned in issue #476606 (closed)

  • added 1 commit

    • bb8e7ab2 - Add documentation to ndjson pipeline methods

    Compare with previous version

  • Ghost User
  • 181 # of objects.
    182 #
    183 # For example, if the relation_definition is: { "notes" => { "events" => {} } }
    184 #
    185 # and the relation_object a merge_request with the following notes:
    186 #
    187 # event1 = Event.new
    188 # note1 = Note.new(events: [event1])
    189 # event2 = Event.new
    190 # note2 = Note.new(events: [event2])
    191 # merge_request = MergeRequest.new(notes:[note1, note2])
    192 #
    193 # the flatten_objects list will contain:
    194 # [note1, event1, note2, event2]
    195 #
    196 # rubocop:disable GitlabSecurity/PublicSend -- only methods in the relation_definition are called
  • 224 next unless object.respond_to?(:source_user_references)
    225
    226 object.source_user_references.each do |attribute, source_user_identifier|
    227 source_user = source_user_mapper.find_source_user(source_user_identifier)
    228
    229 # Verify if the attribute is an ActiveRecord attribute. This check
    230 # prevents a reference from being created for attributes that do
    231 # not exist on the database table. For example, SystemNoteMetadata
    232 # responds to `source_user_references` as the method is delegated to
    233 # notes causing a reference to be created for all note's
    234 # source_user_references
    235 next unless object.has_attribute?(attribute)
    236
    237 # Do not create a reference if the object is already associated
    238 # with a real user.
    239 next if source_user.accepted_status? && object[attribute] == source_user.reassign_to_user_id
  • Rodrigo Tomonari requested review from @.luke

    requested review from @.luke

  • requested review from @georgekoltsov

  • Sam Word mentioned in merge request !156672 (merged)

    mentioned in merge request !156672 (merged)

  • Luke Duncalfe
  • 26 28
    27 29 private
    28 30
    31 def source_user_member(context, data)
    32 access_level = data&.dig('access_level', 'integer_value')
    33
    34 return unless data
    35 return unless valid_access_level?(access_level)
    36
    37 source_user = find_or_create_source_user(context, data)
    38
    39 # To not add user as a member is the source user is mapped to the ImportUser because we cannot
    40 # create multiple members for the same user.
    41 return if source_user.mapped_user.import_user?
  • 159 import_type: Import::SOURCE_DIRECT_TRANSFER,
    160 source_hostname: context.configuration.source_hostname
    161 )
    162 end
    163
    164 # Creates an Import::SourceUser objects for each source_user_identifier
    165 # found in the relation_hash and associate it with the ImportUser.
    166 def create_import_source_users(_relation_key, relation_hash)
    167 relation_factory::USER_REFERENCES.each do |reference|
    168 next unless relation_hash[reference]
    169
    170 source_user_mapper.find_or_create_source_user(
    171 source_name: nil,
    172 source_username: nil,
    173 source_user_identifier: relation_hash[reference],
    174 use_import_user: true
  • requested review from @ameyadarshan

  • 65 71 PROJECT_MEMBER_RELATIONS
    66 72 end
    67 73 end
    74
    75 def push_placeholder_reference(source_source, member)
    76 return unless member.user.placeholder?
    • Question

      Because we've checked that the user is not the import user within the MemberAttributesTransformer, do we know at this point that the user must be a placeholder user? If so, does this line add an extra DB call unnecessarily?

    • No, at this point, we don't know if the member is a human member or a placeholder member. We only know that the user isn't an ImportUser.

      Anyway, it was decided not to create placeholder users as members, so this part will change.

    • @rodrigo.tomonari Does this mean we won't be creating Member records with placeholder users at all, or should we just filter out Member records for non-placeholder users?

    • @SamWord, I made my comment based on the approach we discussed in the sync call, which is to create a separate table to store the source user membership details and create the members after the reassignment, which would require extending the reassign process, but it wouldn't mess up with memberships features.

      As you know, the other approach is to keep creating placeholder users as members but hide them in the UI and API.

      I will create a separate issue with both proposals and then we can discuss which approach to take.

    • Please register or sign in to reply
  • 10 module Import
    11 module BulkImports
    12 class SourceUsersMapper
    13 include Gitlab::Utils::StrongMemoize
    14
    15 # Gitlab::ImportExport::Base::RelationFactory expects member_mapper#map to
    16 # return an object that responds to []. For the others mappers a hash is
    17 # returned. In this case SourceUsersMapper#map returns a class that responds
    18 # to [].
    19 class MockedHash
    20 def initialize(source_user_mapper)
    21 @source_user_mapper = source_user_mapper
    22 end
    23
    24 def [](user_identifier)
    25 @source_user_mapper.find_source_user(user_identifier).mapped_user.id
  • 33 def map
    34 @map ||= MockedHash.new(source_user_mapper)
    35 end
    36
    37 # Import::SourceUser with the respective user_identifier will always since
    38 # they are created beforehand.
    39 def include?(_user_identifier)
    40 true
    41 end
    42
    43 private
    44
    45 attr_reader :context
    46
    47 def source_user_mapper
    48 @source_user_mapper ||= Gitlab::Import::SourceUserMapper.new(
    • Thought

      We have three similar places where we initialize a Gitlab::Import::SourceUserMapper passing in the same params. I wonder if we should add a method to the context class (is it BulkImports::Pipeline::Context?) to return it?

      module BulkImports
        module Pipeline
          class Context
            def source_user_mapper
              @source_user_mapper ||= Gitlab::Import::SourceUserMapper.new(
                namespace: portable.root_ancestor,
                import_type: Import::SOURCE_DIRECT_TRANSFER,
                source_hostname: configuration.source_hostname
              )
            end
          end
        end
      end

      and then when we are calling it in the code, we would call context.source_user_mapper.

    • Rodrigo Tomonari changed this line in version 13 of the diff

      changed this line in version 13 of the diff

    • Please register or sign in to reply
  • 224 next unless object.respond_to?(:source_user_references)
    225
    226 object.source_user_references.each do |attribute, source_user_identifier|
    227 source_user = source_user_mapper.find_source_user(source_user_identifier)
    228
    229 # Verify if the attribute is an ActiveRecord attribute. This check
    230 # prevents a reference from being created for attributes that do
    231 # not exist on the database table. For example, SystemNoteMetadata
    232 # responds to `source_user_references` as the method is delegated to
    233 # notes causing a reference to be created for all note's
    234 # source_user_references
    235 next unless object.has_attribute?(attribute)
    236
    237 # Do not create a reference if the object is already associated
    238 # with a real user.
    239 next if source_user.accepted_status? && object[attribute] == source_user.reassign_to_user_id
    • Question

      Is it possible for us to only enter this part if we know that we have referenced a placeholder user?

      For example, is it possible to:

      1. When saving an object with a user reference, notice if we used an ID of the reassign_to_user_id or the placeholder_user_id.
      2. Only if we used the placeholder user, later enter this part of the logic.
      Edited by Luke Duncalfe
    • It should be possible, but it would require adding more logic to the RelationFactory class, which I'm avoiding as it is also used by Import/Export.

      In my implementation, the only change to the RelationFactory class is to set the source_user_references attribute with a hash containing a map of the source_user_identifies. For example, object.source_user_references = { author_id: 100, merged_by_id: 101 }. If we filter out source_user_identifies for which the corresponding source_user is mapped to a real user, we won't enter this part of the code you mentioned.

      Just to let you know, I initially added some memory cache to prevent fetching the database multiple times, but I removed it from the MR because it was becoming too complicated. If you're concerned about database queries, I plan to optimize them in a follow-up MR.

    • Please register or sign in to reply
  • Luke Duncalfe removed review request for @.luke

    removed review request for @.luke

  • 26 26 include IdInOrdered
    27 27 include Todoable
    28 28 include Spammable
    29 include Import::HasSourceUserReferences
    • Are these all the models that need to be updated? For instance I don't see Epic model update, but it does have a user reference

    • Yes, all models that may reference a user need to include the module.

      Epic includes the module ::Issuable, which includes the Import::HasSourceUserReferences, so we don't need to change the epics.

      Actually, we also don't need to include it in MergeRequest, as it also include the module ::Issuable

    • Please register or sign in to reply
  • 29 29 # User is already a member with higher existing (inherited) membership
    30 30 return if user_membership && user_membership[:access_level] >= data[:access_level]
    31 31
    32 source_user = data.delete(:source_user)
    33
    32 34 # Create new membership for any other access level
    33 35 member = portable.members.new(data)
    34 36 member.importing = true # avoid sending new member notification to the invited user
    35 37 member.save!
    • So in case when source_user is present, what kind of member are we creating here? We are creating the membership for the source user, correct? From reading the code it's not very clear that's happening (at least to me). I wonder since we are not going to be supporting the old way of mapping users if we should update this class to have a clearer mentions of what we're processing here are placeholder users. Perhaps rename the class to placeholder_members_pipeline, adjust variable names, etc? WDYT?

    • Yeah, it gets confusing since most of the logic is hidden in the transformer.

      The PM decided not to add placeholder users as members, so the class will always create human members.

    • Please register or sign in to reply
  • 26 28
    27 29 private
    28 30
    31 def source_user_member(context, data)
    • non-blocking: we could have a separate placeholder_member_attributes_transformer class with the newly added behavior instead of plugging into this existing class. And in this class we just return data if context.importer_user_mapping_enabled?

    • Yeah, initially, I implemented it using a different transformer, but I thought it would be easier to understand if everything was in the same class. Perhaps I was wrong :sweat_smile:

    • Please register or sign in to reply
  • 65 71 PROJECT_MEMBER_RELATIONS
    66 72 end
    67 73 end
    74
    75 def push_placeholder_reference(source_source, member)
    76 return unless member.user.placeholder?
    77
    78 Import::PlaceholderReferences::PushService.from_record(
    • Just so I understand it clearer, we push a placeholder reference here in order to later on swap membership of this placeholder user to a reassigned real user?

    • Yes, for every imported resource to which we assign a placeholder user or an import user, an Import::SourceUserPlaceholderReference is created. Later, the reassign process uses those records to determine which resources need to be reassigned.

    • Please register or sign in to reply
  • 56 58 raise(ActiveRecord::RecordInvalid, object)
    57 59 end
    58 60
    59 61 object.save!
  • 85 87 end
    86 88 end
    87 89
    90 # Create missing Import::SourceUser objects during the transformation step, as data
    91 # related to the relation has not persisted yet.
    92 create_import_source_users(relation_key, relation_hash) if context.importer_user_mapping_enabled?
    • What if we create a placeholder user for a relation object that fails to be imported? Don't we want to persist the relation object first? Otherwise we might end up with a placeholder user that doesn't have any references, right?

    • I see your point, but how will we save the relation object without an existing user?

      For instance, when creating a merge request and assigning a user as the author, the user must already exist in the database.

    • @rodrigo.tomonari good point, we still need placeholder user to be present in order to assign it to the record before it gets persisted. Perhaps we can add a 'cleanup' step at the end of the migration to remove any placeholder users that don't have any references at the end?

    • Please register or sign in to reply
  • George Koltsov removed review request for @georgekoltsov

    removed review request for @georgekoltsov

  • added 1 commit

    • e34b5146 - Apply 3 suggestion(s) to 2 file(s)

    Compare with previous version

  • added 2 commits

    • 565d3f6e - Remove use_import_user options
    • 00c5465a - Move source_user_mapper to context

    Compare with previous version

  • Ghost User
  • 176 # of objects.
    177 #
    178 # For example, if the relation_definition is: { "notes" => { "events" => {} } }
    179 #
    180 # and the relation_object a merge_request with the following notes:
    181 #
    182 # event1 = Event.new
    183 # note1 = Note.new(events: [event1])
    184 # event2 = Event.new
    185 # note2 = Note.new(events: [event2])
    186 # merge_request = MergeRequest.new(notes:[note1, note2])
    187 #
    188 # the flatten_objects list will contain:
    189 # [note1, event1, note2, event2]
    190 #
    191 # rubocop:disable GitlabSecurity/PublicSend -- only methods in the relation_definition are called
  • Rodrigo Tomonari mentioned in merge request !162098 (merged)

    mentioned in merge request !162098 (merged)

  • Rodrigo Tomonari mentioned in merge request !162484 (merged)

    mentioned in merge request !162484 (merged)

  • Sam Word mentioned in merge request !162619 (merged)

    mentioned in merge request !162619 (merged)

  • Please register or sign in to reply
    Loading