Integrate improved user mapping with Direct Transfer
What does this MR do and why?
Related to: #443557 (closed)
Update Direct Transfer to map contributions to placeholder users for every existing member in the source instance. Contributions created by non-members are assigned to the Import User.
Note
Follow-up MRs will be created to implement some missing features:
- Handle mentions in notes and description
- Check if all references were saved before marking the migration as finished
- Retry the pipeline in case of a Gitlab::Import::SourceUserMapper::FailedToObtainLockError error
- Populate source user names for non-members
- Cache source user details in Redis to reduce database queries
How it works
In summary, the MR changes the MembersPipeline and the NdJsonPipelines. The MembersPipeline pipeline creates an Import::SourceUser and a placeholder user for every member in the source instance. The NdJsonPipelines pipeline creates an Import::SourceUser for every user ID contained in the relation hash that was not created in the MembersPipeline, that is, for all user IDs that aren't members in the source instance. Then, the NdJsonPipelines updates the user IDs found in the relation hash with the corresponding user ID mapped to the Import::SourceUser.
See a more detailed explanation below:
MembersPipeline
When the MembersPipeline is executed, an Import::SourceUser and a placeholder user are created for each member in the source instance. The Import::SourceUser records are created with the source_user_identifier set to the user ID of the source instance. Furthermore, the source_name, source_username, and source_hostname fields are populated with the corresponding information from the source instance.
Placeholder users are added as members of the group/project, and their access level is set to match that of the source instance. If an existing Import::SourceUser record already exists for the source_user_identifier, a placeholder user is not created, and instead, the mapped user (which can be a placeholder or real user) is added as a member. It's important to note that if the user is already a member, a new member is not added, and the access level of the existing member is preserved. Also, a member is not added if the placeholder user is of the type ImportUser.
NdJsonPipelines
When the NdJsonPipeline#transform
method is executed, it searches for an Import::SourceUser with a matching source_user_identifier for every user ID in the relation hash. If none is found, a new Import::SourceUser record is created and the source_user_identifier is set to the same user ID. All created Import::SourceUser are mapped to the same ImportUser, which means all contributions for non-members will be assigned to the ImportUser. Note that currently, the source_name and source_username for the created Import::SourceUser records are set to nil. A subsequent MR will introduce a new pipeline to fetch the information from the source instance and populate the missing details.
For instance, consider a hash-like relation:
{ iid: 1, author_id: 101, merged_by: 102, title: 'MR', notes: [{ note: 'Note 1', author_id: 102 }, { note: 'Note 2', author_id 103 } }] }
If none of the user IDs is found, NdJsonPipeline#transform could create an Import::SourceUser with the source_user_identifies 101, 102, and 103 and the source user would be mapped to the ImportUser.
After creating all missing Import::SourceUser, RelationFactory is executed and the new SourceUsersMapper is provided. This mapper returns the corresponding user in the destination mapped to the source_user_identifier.
For example, SourceUsersMapper#map[101]
returns the user ID of the user mapped to an Import::SourceUser with the source_user_identifier 101. If the Import::SourceUser was reassigned to a real user, SourceUsersMapper#map[101]
will return the user ID of the real user. Otherwise, it returns the user ID of a placeholder user.
Therefore, RelationFactory will build the relation object will all user IDs set to the appropriate user in the destination. Besides, for every built object, the attribute source_user_references
will hold a map of the source user IDs, which later is used to push the placeholder user references.
For instance, consider a hash-like relation:
{ iid: 1, author_id: 101, merged_by: 102, title: 'MR', notes: [{ note: 'Note 1', author_id: 102 }, { note: 'Note 2', author_id 103 } }] }
RelationFactory will return an object like:
MergeRequest.new(
title: 'MR'
author_id: 1
merged_by: 2,
source_user_references: { 'author_id' => 101, 'merged_by' => 102 },
notes: [
Note.new(
note: 'Note 1',
author_id: 2,
source_user_references: { 'author_id' => 102 }
),
Note.new(
note: 'Note 2',
author_id: 3,
source_user_references: { 'author_id' => 103 }
)
]
)
Later, it is persisted by ObjectBuilder as before in the NdJsonPipeline#load method.
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
Before | After |
---|---|
How to set up and validate locally
- Enable the feature flags:
importer_user_mapping
andbulk_import_importer_user_mapping
Feature.enable(:member_areas_of_focus) Feature.enable(:bulk_import_importer_user_mapping)
- New group, import group
- Provide a host user and access token
- Select the group to be imported
- Check if placeholder users were created and contributions assigned to them
Merge request reports
Activity
assigned to @rodrigo.tomonari
added pipelinetier-3 label
7 Warnings This merge request is quite big (1658 lines changed), please consider splitting it into multiple merge requests. 00c5465a: Commits that change 30 or more lines across at least 3 files should describe these changes in the commit body. For more information, take a look at our Commit message guidelines. 565d3f6e: Commits that change 30 or more lines across at least 3 files should describe these changes in the commit body. For more information, take a look at our Commit message guidelines. a90c5e73: Commits that change 30 or more lines across at least 3 files should describe these changes in the commit body. For more information, take a look at our Commit message guidelines. f06a85e8: Commits that change 30 or more lines across at least 3 files should describe these changes in the commit body. For more information, take a look at our Commit message guidelines. 0a74cbf1: Commits that change 30 or more lines across at least 3 files should describe these changes in the commit body. For more information, take a look at our Commit message guidelines. 282aa0f6: Commits that change 30 or more lines across at least 3 files should describe these changes in the commit body. For more information, take a look at our Commit message guidelines. 1 Message CHANGELOG missing: If this merge request needs a changelog entry, add the
Changelog
trailer to the commit message you want to add to the changelog.If this merge request doesn't need a CHANGELOG entry, feel free to ignore this message.
Reviewer roulette
Category Reviewer Maintainer backend @marcogreg
(UTC+8, 11 hours ahead of author)
@alejandro
(UTC-4, 1 hour behind author)
~"Verify" Reviewer review is optional for ~"Verify" @stanhu
(UTC-7, 4 hours behind author)
Please refer to documentation page for guidance on how you can benefit from the Reviewer Roulette, or use the GitLab Review Workload Dashboard to find other available reviewers.
If needed, you can retry the
danger-review
job that generated this comment.Generated by
Danger85 85 ) 86 86 bulk_import.create_configuration!(credentials.slice(:url, :access_token)) 87 87 88 if Feature.enabled?(:importer_user_mapping, current_user) 89 ::Import::BulkImports::EphemeralData.new(bulk_import.id).enable_importer_user_mapping 90 end - Comment on lines +88 to +90
Storing the feature flag status in Redis to ensure that enabling or disabling the feature flag does not impact in-flight migrations.
On the other importers, we can save the information in the database as we have a table to store such information.
changed this line in version 2 of the diff
Generated bygitlab_quality-test_tooling
.
Slow tests detected in this merge request. These slow tests might be related to this merge request's changes.Click to expand
Job File Name Duration Expected duration #7422352764 ee/spec/models/epic_spec.rb#L1030
Epic relative positioning there is a parent behaves like a class that supports relative positioning .move_nulls_to_end can move many nulls 52.77 s < 45.4 s #7432864282 ee/spec/models/epic_spec.rb#L1030
Epic relative positioning there is a parent behaves like a class that supports relative positioning .move_nulls_to_end can move many nulls 45.47 s < 45.4 s #7441716224 ee/spec/models/epic_spec.rb#L1030
Epic relative positioning there is a parent behaves like a class that supports relative positioning .move_nulls_to_end can move many nulls 46.56 s < 45.4 s - A deleted user
added rspec:slow test detected label
changed milestone to %17.3
mentioned in issue #443557 (closed)
12 12 end 13 13 14 def find_or_create_internal_user(source_name:, source_username:, source_user_identifier:) 15 @source_name = source_name 16 @source_username = source_username 17 @source_user_identifier = source_user_identifier 14 def find_source_user(source_user_identifier) 15 ::Import::SourceUser.find_source_user( 16 source_user_identifier: source_user_identifier, 17 namespace: namespace, 18 source_hostname: source_hostname, 19 import_type: import_type 20 ) 21 end 22 23 def find_or_create_source_user(source_name:, source_username:, source_user_identifier:, use_import_user: false) I changed the class to return the Import::SourceUser instead of the internal user, as we usually need access to the source user.
I need to check if it's possible to keep returning the internal_user
Edited by Rodrigo Tomonarichanged this line in version 13 of the diff
added 901 commits
-
92106cda...9d669d32 - 899 commits from branch
master
- 4335eba3 - Integrate improved user mapping with Direct Transfer
- 282aa0f6 - Refactor code and add some specs
-
92106cda...9d669d32 - 899 commits from branch
- A deleted user
added feature flag label
E2E Test Result Summary
allure-report-publisher
generated test report!e2e-test-on-gdk:
test report for 00c5465aexpand test summary
+------------------------------------------------------------------+ | suites summary | +-------------+--------+--------+---------+-------+-------+--------+ | | passed | failed | skipped | flaky | total | result | +-------------+--------+--------+---------+-------+-------+--------+ | Release | 5 | 0 | 0 | 0 | 5 | ✅ | | Verify | 44 | 0 | 2 | 0 | 46 | ✅ | | Create | 127 | 0 | 12 | 0 | 139 | ✅ | | Govern | 71 | 0 | 0 | 0 | 71 | ✅ | | Plan | 70 | 0 | 0 | 0 | 70 | ✅ | | Data Stores | 31 | 0 | 1 | 0 | 32 | ✅ | | Manage | 0 | 0 | 2 | 0 | 2 | ➖ | | Package | 16 | 0 | 15 | 0 | 31 | ✅ | | Analytics | 2 | 0 | 0 | 0 | 2 | ✅ | | Monitor | 8 | 0 | 0 | 0 | 8 | ✅ | | Fulfillment | 2 | 0 | 0 | 0 | 2 | ✅ | | Secure | 2 | 0 | 0 | 0 | 2 | ✅ | +-------------+--------+--------+---------+-------+-------+--------+ | Total | 378 | 0 | 32 | 0 | 410 | ✅ | +-------------+--------+--------+---------+-------+-------+--------+
e2e-package-and-test:
test report for 00c5465aexpand test summary
+-------------------------------------------------------------+ | suites summary | +--------+--------+--------+---------+-------+-------+--------+ | | passed | failed | skipped | flaky | total | result | +--------+--------+--------+---------+-------+-------+--------+ | Manage | 6 | 0 | 14 | 2 | 20 | ✅ | +--------+--------+--------+---------+-------+-------+--------+ | Total | 6 | 0 | 14 | 2 | 20 | ✅ | +--------+--------+--------+---------+-------+-------+--------+
164 # Creates an Import::SourceUser objects for each source_user_identifier 165 # found in the relation_hash and associate it with the ImportUser. 166 def create_import_source_users(_relation_key, relation_hash) 167 relation_factory::USER_REFERENCES.each do |reference| 168 next unless relation_hash[reference] 169 170 source_user_mapper.find_or_create_source_user( 171 source_name: nil, 172 source_username: nil, 173 source_user_identifier: relation_hash[reference], 174 use_import_user: true 175 ) 176 end 177 end 178 179 # rubocop:disable GitlabSecurity/PublicSend -- only methods in the relation_definition are called A disabled RuboCop security rule was detected. Please try to avoid this if at all possible, otherwise an AppSec review is recommended.
This AppSec automation is currently under testing. Use appsec-sasthelpful or appsec-sastunhelpful for quick feedback. For any detailed feedback, add a comment here.
- Resolved by Rodrigo Tomonari
mentioned in merge request !160763 (merged)
4 4 module PlaceholderReferences 5 5 class PushService < BaseService 6 6 class << self 7 def from_record(import_source:, import_uid:, source_user:, record:, user_reference_column:, composite_key: nil) 8 numeric_key = record.id if composite_key.nil? && record.id.is_a?(Integer) 7 def from_record(import_source:, import_uid:, source_user:, record:, user_reference_column:) 8 if record.is_a?(IssueAssignee) 9 composite_key = { 10 'issue_id' => record.issue_id, 11 'user_id' => record.user_id 12 } non-blocking: Is it exclusive to
IssueAssignee
? Perhaps we should have a new class that returns a composite key depending on the object you're passing, to accommodate for potential future extension, if we need to add a new type of composite key for another class? e.g.PlaceholderReferenceCompositeKey.new(record).execute
Is it exclusive to
IssueAssignee
?I checked all models exported by Import/Export, and only
IssueAssignee
needs to be handled differently.Other models like ApprovalProjectRulesProtectedBranch also use a composite primary key, and I believe Import/Export exports them, but the other models don't reference a user, which means we won't create a placeholder reference for them.
Perhaps we should have a new class that returns a composite key depending on the object you're passing
I considered moving the logic to a different place, but since
IssueAssignee
is the only exception and we probably won't need to extend this, I will keep it like this for now.changed this line in version 10 of the diff
requested review from @georgekoltsov
@georgekoltsov, since you have experience with Direct Transfer and Import Export, I think you're the most appropriate person to review this MR
Thanks @rodrigo.tomonari I left some comments to consider
@georgekoltsov, I replied to the comments and made the suggested modifications.
While testing, I found a bug regarding
SystemNoteMetadata
. I fixed the problem in this commit.@.luke can I also ask you to review the changes in parallel
Edited by Rodrigo Tomonari@rodrigo.tomonari I am OOO August 5th, will be able to review it on Tuesday August 6th
@rodrigo.tomonari Looks great, I've some small thoughts and some questions to help me understand some parts.
Edited by Luke Duncalfe@rodrigo.tomonari looks good overall, I added a few more comments.
@.luke @georgekoltsov, thank you for the inputs. I think I answered most of them.
However, since it was decided that we will no longer create placeholder users as members, the implementation will change a little.
I will take this opportunity to split the MR into two or three smaller ones.
/cc @ameyadarshan
1 # frozen_string_literal: true 2 3 module Import 4 module HasSourceUserReferences 5 extend ActiveSupport::Concern 6 7 attr_accessor :source_user_references It isn't the most elegant solution, but I needed a temporary place to store the source_user_identifies for later use.
Other importers will probably not use this, and it will only be needed for Direct Transfer, which splits the transform and load phase into different steps.
Another alternative would be to wrap or create a different structure for the objects built by RelationFactory, but that would require too many changes.
@rodrigo.tomonari With this approach, if we want to support user mapping for a new kind of object in Direct Transfer, would we require that the model has this module mixed in? We've tried to make user mapping work with new relations out of the box. I wonder where else we could put this data? Would something like
Gitlab::SafeRequestStore
work?
4 4 module PlaceholderReferences 5 5 class PushService < BaseService 6 6 class << self 7 def from_record(import_source:, import_uid:, source_user:, record:, user_reference_column:, composite_key: nil) 8 numeric_key = record.id if composite_key.nil? && record.id.is_a?(Integer) 7 def from_record(import_source:, import_uid:, source_user:, record:, user_reference_column:) 8 if record.is_a?(IssueAssignee) 9 composite_key = { 10 'issue_id' => record.issue_id, 11 'user_id' => record.user_id 12 } 13 else 14 numeric_key = record.id The removed line
numeric_key = record.id if composite_key.nil? && record.id.is_a?(Integer)
above implied that there are cases where composite_key is nil and record id is not an integer. With this change, can that still be the case?Also, if record is
IssueAssignee
then the numeric key is going to raise undefined method error, right?Good call. I don't think there is a case where
record.id
is a string. Also, the specs don't have a test for this case, so it should be fine.Just to be sure, let me ping @.luke which implemented the logic.
Also, if record is
IssueAssignee
then the numeric key is going to raise undefined method error, right?I also thought it would raise an error. But it doesn't raise.
For example, the code below doesn't raise an error:
class Test def test if true a = 1 else b = 2 end a b end end Test.new.test => nil
Anyway, to be on the safe side, I update the code to set the variable to nil
@rodrigo.tomonari I think what I meant by testing
.id.is_a?(Integer)
was to only assign anumeric_key
value from the record if the record didn't have a composite primary key. Records with a normal primary key would return an Integer from the#id
call even if the primary key was a different column name, but records with a composite primary key would returnnil
.When I was writing that I was thinking that the calling code would pass a
composite_key:
value if there should be one. But I think the way you've change this to havePushService
handle it like this looks better.If there is another record in future that has a compositie primary key and isn't
IssueAssignee
, what would happen is bothcomposite_key
andnumeric_key
would benil
, and the service would return an error response. It makes me wonder whether here we should log when the reference is not valid, possibly raise an error in development, because it represents data loss. Otherwise we're expecting the places that call this service to do something similar.
- Resolved by Rodrigo Tomonari
26 28 27 29 private 28 30 31 def source_user_transformer(context, data) Actually, the method returns a member hash.
I called the method
source_user_transformer
transformer to emphasize that the method creates a member hash using the source_user feature.Otherwise, if the feature flag is disabled, the current approach, which maps the user via email, is used.
I renamed it to source_user_member. I hope it's better
changed this line in version 10 of the diff
85 87 end 86 88 end 87 89 90 # Create missing Import::SourceUser objects during the transformation step, as no data 91 # related to the relation has been persisted yet. 92 create_import_source_users(relation_key, relation_hash) if context.importer_user_mapping_enabled? I don't think it can be used. Third-party importers use a different structure for the data hash.
For example, GitHub API returns users as objects, like the example below. So the other importers will have to implement their own method to extract the data.
{ "number": 1, "title": "Pull request", "author": { "id": 101, "login": "octocat" } }
Yeah, indeed, it is pretty big. Do you have a suggestion on how to split it?
Edited by Rodrigo Tomonari@rodrigo.tomonari just an example
- 1 MR for updating models with
include Import::HasSourceUserReferences
- 1 MR for creating source users in members transformer
- 1 MR for creating source users in ndjson pipeline
- 1 MR for updating models with
29 29 # User is already a member with higher existing (inherited) membership 30 30 return if user_membership && user_membership[:access_level] >= data[:access_level] 31 31 32 source_source = data.delete(:source_user) 33 32 34 # Create new membership for any other access level 33 35 member = portable.members.new(data) 34 36 member.importing = true # avoid sending new member notification to the invited user 35 37 member.save! 38 39 push_placeholder_reference(source_source, member) if context.importer_user_mapping_enabled? 40 41 member created #476606 (closed) to continue this discussion
It seems like placeholder users are being counted as billable users
@rodrigo.tomonari are they counted towards billable users when created, without adding as member of the group? I worked on importer user and creating importer user doesn't mark it as billable, but not sure if it becomes billable once added to the group. My main concern is not self-managed but .com where we bill groups instead of instances. So if placeholder/import users count towards billable users on .com then we need to exclude them. I imagine bot users are not billable but appear in the members list, so maybe we can do something similar in our case.
@georgekoltsov, from what I investigated, they are being counted as billable users when added as members.
For example, groups can use the endpoint billable_members API to check the list of members that are billable and placeholder users appear on the list if they are added as members.
Marking the user types as bots fixes the problem.
I created #476606 (closed) to handle the problem
- Resolved by Rodrigo Tomonari
155 156 def source_user_mapper 157 @source_user_mapper ||= Gitlab::Import::SourceUserMapper.new( 158 namespace: context.portable.root_ancestor, 159 import_type: Import::SOURCE_DIRECT_TRANSFER, 160 source_hostname: context.configuration.source_hostname 161 ) 162 end 163 164 # Creates an Import::SourceUser objects for each source_user_identifier 165 # found in the relation_hash and associate it with the ImportUser. 166 def create_import_source_users(_relation_key, relation_hash) 167 relation_factory::USER_REFERENCES.each do |reference| 168 next unless relation_hash[reference] 169 170 source_user_mapper.find_or_create_source_user( non-blocking: is this step needed in case when found user isn't a member of the source group/project? If so, I wonder if it makes sense to have placeholder user creation as a separate pipeline before we start importing any data? As far as I understood we started to export user contributions ndjson file where all of the users contributions should be available, but we don't seem to be using them, right? Wonder if a separate stage pipeline can be a cleaner solution?
Yeah, a separate pipeline/stage would be a better approach, however, there are two problems with the new
user_contributions
relation, which is why I'm not using it:The first problem is that the relation is only exported after all other relations have finished, which means the migration would be stuck in this stage until everything is exported.
The second problem is that the relation is unavailable for previous GitLab versions, and ideally, the feature should work for all versions.
FYI: I'm planning to use the new relation file to fill in the source_username and source_name which are being set to nil here.
The first problem is that the relation is only exported after all other relations have finished, which means the migration would be stuck in this stage until everything is exported.
Alteration to our import process that can help with the import to not be 'so stuck' is to update our stages to import binaries at the very beginning:
- Repository
- Uploads
- Snippets
- Wiki
- LFS
- And only then continue waiting for user contributions to be available
Just an idea
The second problem is that the relation is unavailable for previous GitLab versions, and ideally, the feature should work for all versions.
Good point we still need to have source users creation during ndjson pipeline to fall back on
removed review request for @georgekoltsov
added 2 commits
164 # Creates an Import::SourceUser objects for each source_user_identifier 165 # found in the relation_hash and associate it with the ImportUser. 166 def create_import_source_users(_relation_key, relation_hash) 167 relation_factory::USER_REFERENCES.each do |reference| 168 next unless relation_hash[reference] 169 170 source_user_mapper.find_or_create_source_user( 171 source_name: nil, 172 source_username: nil, 173 source_user_identifier: relation_hash[reference], 174 use_import_user: true 175 ) 176 end 177 end 178 179 # rubocop:disable GitlabSecurity/PublicSend -- only methods in the relation_definition are called A disabled RuboCop security rule was detected. Please try to avoid this if at all possible, otherwise an AppSec review is recommended.
This AppSec automation is currently under testing. Use appsec-sasthelpful or appsec-sastunhelpful for quick feedback. For any detailed feedback, add a comment here.
mentioned in merge request !159648 (merged)
mentioned in issue #476606 (closed)
added 1 commit
- bb8e7ab2 - Add documentation to ndjson pipeline methods
181 # of objects. 182 # 183 # For example, if the relation_definition is: { "notes" => { "events" => {} } } 184 # 185 # and the relation_object a merge_request with the following notes: 186 # 187 # event1 = Event.new 188 # note1 = Note.new(events: [event1]) 189 # event2 = Event.new 190 # note2 = Note.new(events: [event2]) 191 # merge_request = MergeRequest.new(notes:[note1, note2]) 192 # 193 # the flatten_objects list will contain: 194 # [note1, event1, note2, event2] 195 # 196 # rubocop:disable GitlabSecurity/PublicSend -- only methods in the relation_definition are called A disabled RuboCop security rule was detected. Please try to avoid this if at all possible, otherwise an AppSec review is recommended.
This AppSec automation is currently under testing. Use appsec-sasthelpful or appsec-sastunhelpful for quick feedback. For any detailed feedback, add a comment here.
224 next unless object.respond_to?(:source_user_references) 225 226 object.source_user_references.each do |attribute, source_user_identifier| 227 source_user = source_user_mapper.find_source_user(source_user_identifier) 228 229 # Verify if the attribute is an ActiveRecord attribute. This check 230 # prevents a reference from being created for attributes that do 231 # not exist on the database table. For example, SystemNoteMetadata 232 # responds to `source_user_references` as the method is delegated to 233 # notes causing a reference to be created for all note's 234 # source_user_references 235 next unless object.has_attribute?(attribute) 236 237 # Do not create a reference if the object is already associated 238 # with a real user. 239 next if source_user.accepted_status? && object[attribute] == source_user.reassign_to_user_id requested review from @.luke
requested review from @georgekoltsov
mentioned in merge request !156672 (merged)
- Resolved by Rodrigo Tomonari
- Resolved by Rodrigo Tomonari
26 28 27 29 private 28 30 31 def source_user_member(context, data) 32 access_level = data&.dig('access_level', 'integer_value') 33 34 return unless data 35 return unless valid_access_level?(access_level) 36 37 source_user = find_or_create_source_user(context, data) 38 39 # To not add user as a member is the source user is mapped to the ImportUser because we cannot 40 # create multiple members for the same user. 41 return if source_user.mapped_user.import_user? 159 import_type: Import::SOURCE_DIRECT_TRANSFER, 160 source_hostname: context.configuration.source_hostname 161 ) 162 end 163 164 # Creates an Import::SourceUser objects for each source_user_identifier 165 # found in the relation_hash and associate it with the ImportUser. 166 def create_import_source_users(_relation_key, relation_hash) 167 relation_factory::USER_REFERENCES.each do |reference| 168 next unless relation_hash[reference] 169 170 source_user_mapper.find_or_create_source_user( 171 source_name: nil, 172 source_username: nil, 173 source_user_identifier: relation_hash[reference], 174 use_import_user: true changed this line in version 13 of the diff
requested review from @ameyadarshan
- Resolved by Rodrigo Tomonari
- Resolved by Rodrigo Tomonari
65 71 PROJECT_MEMBER_RELATIONS 66 72 end 67 73 end 74 75 def push_placeholder_reference(source_source, member) 76 return unless member.user.placeholder? No, at this point, we don't know if the member is a human member or a placeholder member. We only know that the user isn't an ImportUser.
Anyway, it was decided not to create placeholder users as members, so this part will change.
@rodrigo.tomonari Does this mean we won't be creating Member records with placeholder users at all, or should we just filter out Member records for non-placeholder users?
@SamWord, I made my comment based on the approach we discussed in the sync call, which is to create a separate table to store the source user membership details and create the members after the reassignment, which would require extending the reassign process, but it wouldn't mess up with memberships features.
As you know, the other approach is to keep creating placeholder users as members but hide them in the UI and API.
I will create a separate issue with both proposals and then we can discuss which approach to take.
10 module Import 11 module BulkImports 12 class SourceUsersMapper 13 include Gitlab::Utils::StrongMemoize 14 15 # Gitlab::ImportExport::Base::RelationFactory expects member_mapper#map to 16 # return an object that responds to []. For the others mappers a hash is 17 # returned. In this case SourceUsersMapper#map returns a class that responds 18 # to []. 19 class MockedHash 20 def initialize(source_user_mapper) 21 @source_user_mapper = source_user_mapper 22 end 23 24 def [](user_identifier) 25 @source_user_mapper.find_source_user(user_identifier).mapped_user.id 33 def map 34 @map ||= MockedHash.new(source_user_mapper) 35 end 36 37 # Import::SourceUser with the respective user_identifier will always since 38 # they are created beforehand. 39 def include?(_user_identifier) 40 true 41 end 42 43 private 44 45 attr_reader :context 46 47 def source_user_mapper 48 @source_user_mapper ||= Gitlab::Import::SourceUserMapper.new( Thought
We have three similar places where we initialize a
Gitlab::Import::SourceUserMapper
passing in the same params. I wonder if we should add a method to thecontext
class (is itBulkImports::Pipeline::Context
?) to return it?module BulkImports module Pipeline class Context def source_user_mapper @source_user_mapper ||= Gitlab::Import::SourceUserMapper.new( namespace: portable.root_ancestor, import_type: Import::SOURCE_DIRECT_TRANSFER, source_hostname: configuration.source_hostname ) end end end end
and then when we are calling it in the code, we would call
context.source_user_mapper
.changed this line in version 13 of the diff
224 next unless object.respond_to?(:source_user_references) 225 226 object.source_user_references.each do |attribute, source_user_identifier| 227 source_user = source_user_mapper.find_source_user(source_user_identifier) 228 229 # Verify if the attribute is an ActiveRecord attribute. This check 230 # prevents a reference from being created for attributes that do 231 # not exist on the database table. For example, SystemNoteMetadata 232 # responds to `source_user_references` as the method is delegated to 233 # notes causing a reference to be created for all note's 234 # source_user_references 235 next unless object.has_attribute?(attribute) 236 237 # Do not create a reference if the object is already associated 238 # with a real user. 239 next if source_user.accepted_status? && object[attribute] == source_user.reassign_to_user_id Question
Is it possible for us to only enter this part if we know that we have referenced a placeholder user?
For example, is it possible to:
- When saving an object with a user reference, notice if we used an ID of the
reassign_to_user_id
or theplaceholder_user_id
. - Only if we used the placeholder user, later enter this part of the logic.
Edited by Luke Duncalfe- When saving an object with a user reference, notice if we used an ID of the
It should be possible, but it would require adding more logic to the RelationFactory class, which I'm avoiding as it is also used by Import/Export.
In my implementation, the only change to the
RelationFactory
class is to set thesource_user_references
attribute with a hash containing a map of the source_user_identifies. For example,object.source_user_references = { author_id: 100, merged_by_id: 101 }
. If we filter out source_user_identifies for which the corresponding source_user is mapped to a real user, we won't enter this part of the code you mentioned.Just to let you know, I initially added some memory cache to prevent fetching the database multiple times, but I removed it from the MR because it was becoming too complicated. If you're concerned about database queries, I plan to optimize them in a follow-up MR.
removed review request for @.luke
26 26 include IdInOrdered 27 27 include Todoable 28 28 include Spammable 29 include Import::HasSourceUserReferences Yes, all models that may reference a user need to include the module.
Epic includes the module
::Issuable,
which includes theImport::HasSourceUserReferences
, so we don't need to change the epics.Actually, we also don't need to include it in MergeRequest, as it also include the module
::Issuable
29 29 # User is already a member with higher existing (inherited) membership 30 30 return if user_membership && user_membership[:access_level] >= data[:access_level] 31 31 32 source_user = data.delete(:source_user) 33 32 34 # Create new membership for any other access level 33 35 member = portable.members.new(data) 34 36 member.importing = true # avoid sending new member notification to the invited user 35 37 member.save! So in case when
source_user
is present, what kind of member are we creating here? We are creating the membership for the source user, correct? From reading the code it's not very clear that's happening (at least to me). I wonder since we are not going to be supporting the old way of mapping users if we should update this class to have a clearer mentions of what we're processing here are placeholder users. Perhaps rename the class toplaceholder_members_pipeline
, adjust variable names, etc? WDYT?
26 28 27 29 private 28 30 31 def source_user_member(context, data) 65 71 PROJECT_MEMBER_RELATIONS 66 72 end 67 73 end 74 75 def push_placeholder_reference(source_source, member) 76 return unless member.user.placeholder? 77 78 Import::PlaceholderReferences::PushService.from_record( 56 58 raise(ActiveRecord::RecordInvalid, object) 57 59 end 58 60 59 61 object.save! @georgekoltsov, good question. Can you remind me in which situation we would enter in this part of the code?
@rodrigo.tomonari I don't fully recall, but I think it's for a situation when transformation step returns an 'existing' record in order to not create duplicates. An existing record is an instance of one of these classes.
85 87 end 86 88 end 87 89 90 # Create missing Import::SourceUser objects during the transformation step, as data 91 # related to the relation has not persisted yet. 92 create_import_source_users(relation_key, relation_hash) if context.importer_user_mapping_enabled? @rodrigo.tomonari good point, we still need placeholder user to be present in order to assign it to the record before it gets persisted. Perhaps we can add a 'cleanup' step at the end of the migration to remove any placeholder users that don't have any references at the end?
removed review request for @georgekoltsov
added 2 commits
176 # of objects. 177 # 178 # For example, if the relation_definition is: { "notes" => { "events" => {} } } 179 # 180 # and the relation_object a merge_request with the following notes: 181 # 182 # event1 = Event.new 183 # note1 = Note.new(events: [event1]) 184 # event2 = Event.new 185 # note2 = Note.new(events: [event2]) 186 # merge_request = MergeRequest.new(notes:[note1, note2]) 187 # 188 # the flatten_objects list will contain: 189 # [note1, event1, note2, event2] 190 # 191 # rubocop:disable GitlabSecurity/PublicSend -- only methods in the relation_definition are called A disabled RuboCop security rule was detected. Please try to avoid this if at all possible, otherwise an AppSec review is recommended.
This AppSec automation is currently under testing. Use appsec-sasthelpful or appsec-sastunhelpful for quick feedback. For any detailed feedback, add a comment here.
mentioned in merge request !162098 (merged)
mentioned in merge request !162484 (merged)
mentioned in merge request !162619 (merged)
mentioned in issue gitlab-org/quality/triage-reports#20593 (closed)
mentioned in issue gitlab-org/quality/triage-reports#20982 (closed)
mentioned in issue gitlab-org/quality/triage-reports#21523 (closed)
mentioned in issue gitlab-org/quality/triage-reports#22009