Add service handling for user mapping memberships

What does this MR do and why?

Adds a service to save membership information for a placeholder user during an import.

Later this information is processed in Import::ReassignPlaceholderUserRecordsService, and memberships are created for the real user.

Related to #477845 (closed)

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

How to set up and validate locally

Rodrigo has made an integration branch of this MR with the Direct Transfer importer !163413 (merged).

The easiest way to test this branch is through that integration branch. Checkout that branch, and then:

  1. Enable importer_user_mapping and bulk_import_importer_user_mapping feature flags.
  2. Stage a small direct transfer migration of a group with projects on your localhost . Ensure the group and/or project has some members.
  3. After the import, reassign some of the contributions of a source user who was a member to a real user, and then as that user, accept the contribution assignment.
  4. During the contribution reassignment process, the user should be assigned the memberships that the source user had within the imported group/projects.

Database queries

Data was prepared by using exec in Postgres.ai.

-- insert 1 record into import_source_users

DO $$
DECLARE temprow RECORD;
BEGIN FOR temprow IN
    SELECT id FROM users LIMIT 1
  LOOP
    INSERT INTO import_source_users (placeholder_user_id,namespace_id,created_at,updated_at,source_username,source_name,source_user_identifier,source_hostname,import_type) VALUES (temprow.id,9970,'2024-08-22 01:05:58.372040','2024-08-22 01:05:58.372040','user3','Sidney Jones4', cast(temprow.id as varchar), 'github.com', 'github');
  END LOOP;
END; $$

-- insert 100_000 records into import_placeholder_memberships, 10_000 each for 10 groups

DO $$
DECLARE temprow RECORD;
BEGIN FOR temprow IN
    SELECT id FROM namespaces LIMIT 10
  LOOP
    BEGIN 
      FOR i IN 1..10000 LOOP
        INSERT INTO import_placeholder_memberships (source_user_id,namespace_id,group_id,project_id,created_at,expires_at,access_level) VALUES (1,9970,temprow.id,null,'2024-09-11 01:05:58.372040',null,10);
      END LOOP;
    END;
  END LOOP;
END; $$

-- insert 100_000 records into import_placeholder_memberships, 10_000 each for 10 projects

DO $$
DECLARE temprow RECORD;
BEGIN FOR temprow IN
    SELECT id FROM projects LIMIT 10
  LOOP
    BEGIN 
      FOR i IN 1..10000 LOOP
        INSERT INTO import_placeholder_memberships (source_user_id,namespace_id,group_id,project_id,created_at,expires_at,access_level) VALUES (1,9970,null,temprow.id,'2024-09-11 01:05:58.372040',null,10);
      END LOOP;
    END;
  END LOOP;
END; $$

Member#destroy!

Asked for by bot !163077 (comment 2093266000).

Note, there is no data on production. There will not be a great deal of records deleted at a time, as there will be at most 1 import_placeholder_memberships record per group or project imported into a particular root namespace for each "source user". At a guess, I would say the 99th percentile will be something like 500 records.

DELETE FROM "import_placeholder_memberships" WHERE "import_placeholder_memberships"."source_user_id" = 1;

Import::Placeholders::Membership.with_projects.by_source_user(import_source_user).by_projects_in_namespace(import_source_user.namespace).find_each

This is used in #create_project_memberships.

Note The poor performance is due to the subquery using traversal_ids using 9970 (gitlab-org). If we use another ID the explain looks happier.

explain

SELECT
   "import_placeholder_memberships".* 
FROM
   "import_placeholder_memberships" 
WHERE
   "import_placeholder_memberships"."source_user_id" = 1 
   AND "import_placeholder_memberships"."project_id" IN 
   (
      SELECT
         "projects"."id" 
      FROM
         "projects" 
      WHERE
         "projects"."namespace_id" IN 
         (
            SELECT
               namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id 
            FROM
               "namespaces" 
            WHERE
               "namespaces"."type" = 'Group' 
               AND 
               (
                  traversal_ids @ > ('{9970}')
               )
         )
   )
ORDER BY
   "import_placeholder_memberships"."id" ASC LIMIT 1000

Import::Placeholders::Membership.with_groups.by_source_user(import_source_user).by_groups_in_namespace(import_source_user.namespace).find_each

This is used in #create_group_memberships.

Note as above, the poor performance of the explain appears to be due to the traversal_ids when using gitlab-org.

explain

SELECT
   "import_placeholder_memberships".* 
FROM
   "import_placeholder_memberships" 
WHERE
   "import_placeholder_memberships"."source_user_id" = 1 
   AND "import_placeholder_memberships"."group_id" IN 
   (
      SELECT
         namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id 
      FROM
         "namespaces" 
      WHERE
         "namespaces"."type" = 'Group' 
         AND 
         (
            traversal_ids @ > ('{9970}')
         )
   )
ORDER BY
   "import_placeholder_memberships"."id" ASC LIMIT 1000
Edited by Luke Duncalfe

Merge request reports

Loading