Add service handling for user mapping memberships
What does this MR do and why?
Adds a service to save membership information for a placeholder user during an import.
Later this information is processed in
Import::ReassignPlaceholderUserRecordsService, and memberships are
created for the real user.
Related to #477845 (closed)
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
How to set up and validate locally
Rodrigo has made an integration branch of this MR with the Direct Transfer importer !163413 (merged).
The easiest way to test this branch is through that integration branch. Checkout that branch, and then:
- Enable
importer_user_mappingandbulk_import_importer_user_mappingfeature flags. - Stage a small direct transfer migration of a group with projects on your localhost . Ensure the group and/or project has some members.
- After the import, reassign some of the contributions of a source user who was a member to a real user, and then as that user, accept the contribution assignment.
- During the contribution reassignment process, the user should be assigned the memberships that the source user had within the imported group/projects.
Database queries
Data was prepared by using exec in Postgres.ai.
-- insert 1 record into import_source_users
DO $$
DECLARE temprow RECORD;
BEGIN FOR temprow IN
SELECT id FROM users LIMIT 1
LOOP
INSERT INTO import_source_users (placeholder_user_id,namespace_id,created_at,updated_at,source_username,source_name,source_user_identifier,source_hostname,import_type) VALUES (temprow.id,9970,'2024-08-22 01:05:58.372040','2024-08-22 01:05:58.372040','user3','Sidney Jones4', cast(temprow.id as varchar), 'github.com', 'github');
END LOOP;
END; $$
-- insert 100_000 records into import_placeholder_memberships, 10_000 each for 10 groups
DO $$
DECLARE temprow RECORD;
BEGIN FOR temprow IN
SELECT id FROM namespaces LIMIT 10
LOOP
BEGIN
FOR i IN 1..10000 LOOP
INSERT INTO import_placeholder_memberships (source_user_id,namespace_id,group_id,project_id,created_at,expires_at,access_level) VALUES (1,9970,temprow.id,null,'2024-09-11 01:05:58.372040',null,10);
END LOOP;
END;
END LOOP;
END; $$
-- insert 100_000 records into import_placeholder_memberships, 10_000 each for 10 projects
DO $$
DECLARE temprow RECORD;
BEGIN FOR temprow IN
SELECT id FROM projects LIMIT 10
LOOP
BEGIN
FOR i IN 1..10000 LOOP
INSERT INTO import_placeholder_memberships (source_user_id,namespace_id,group_id,project_id,created_at,expires_at,access_level) VALUES (1,9970,null,temprow.id,'2024-09-11 01:05:58.372040',null,10);
END LOOP;
END;
END LOOP;
END; $$
Member#destroy!
Asked for by bot !163077 (comment 2093266000).
Note, there is no data on production. There will not be a great deal of records deleted at a time, as there will be at most 1 import_placeholder_memberships record per group or project imported into a particular root namespace for each "source user". At a guess, I would say the 99th percentile will be something like 500 records.
DELETE FROM "import_placeholder_memberships" WHERE "import_placeholder_memberships"."source_user_id" = 1;
Import::Placeholders::Membership.with_projects.by_source_user(import_source_user).by_projects_in_namespace(import_source_user.namespace).find_each
This is used in #create_project_memberships.
Note The poor performance is due to the subquery using traversal_ids using 9970 (gitlab-org). If we use another ID the explain looks happier.
SELECT
"import_placeholder_memberships".*
FROM
"import_placeholder_memberships"
WHERE
"import_placeholder_memberships"."source_user_id" = 1
AND "import_placeholder_memberships"."project_id" IN
(
SELECT
"projects"."id"
FROM
"projects"
WHERE
"projects"."namespace_id" IN
(
SELECT
namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id
FROM
"namespaces"
WHERE
"namespaces"."type" = 'Group'
AND
(
traversal_ids @ > ('{9970}')
)
)
)
ORDER BY
"import_placeholder_memberships"."id" ASC LIMIT 1000
Import::Placeholders::Membership.with_groups.by_source_user(import_source_user).by_groups_in_namespace(import_source_user.namespace).find_each
This is used in #create_group_memberships.
Note as above, the poor performance of the explain appears to be due to the traversal_ids when using gitlab-org.
SELECT
"import_placeholder_memberships".*
FROM
"import_placeholder_memberships"
WHERE
"import_placeholder_memberships"."source_user_id" = 1
AND "import_placeholder_memberships"."group_id" IN
(
SELECT
namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id
FROM
"namespaces"
WHERE
"namespaces"."type" = 'Group'
AND
(
traversal_ids @ > ('{9970}')
)
)
ORDER BY
"import_placeholder_memberships"."id" ASC LIMIT 1000