Skip to content

Resolve "Provide a method to gather IDs of all projects that are going to be affected, when membership changes are made in a particular group"

Manoj M J requested to merge 357479-group-affected-projects into master

What does this MR do and why?

This is the first step towards implementing the idea we came up with in https://gitlab.com/gitlab-org/gitlab/-/issues/357479#note_924149652

With this newly introduced Groups::ProjectsRequiringAuthorizationsRefresh::OnDirectMembershipFinder.new(group).execute method, we now have the ability to get a list of IDs of all projects whose project_authroizations records needs to be updated when a user is added/removed/updated in a group.

Previously, there was no means to gather this list of projects in one go, and hence we used to refresh all of the project_authroizations records across all of the GitLab instance for that specific user. This is expensive because

  • we are reading unnecessary data - ie, if a user is added to Group A, we do not need to update the authorizations of that user in say, projects within their personal namespace or projects within an unrelated Group B. Hence reading such data would ultimately go to waste.
  • we cannot perform efficient deduplication - imagine when hundreds of members are being constantly added to the same group. If we schedule one job each for each of these users, the user_id of each user is unique, so we cannot have any deduplication. However, instead of enqueing AuthorizedProjectsWorker.perform_async(user.id) for each of these users, if we enqueue recalculation jobs for each of the affected projects within the group, ie AuthorizedProjectUpdate::ProjectRecalculateWorker.perform_async(project.id), we can have effective deduplication because the project.ids does not change. Many automations of customers usually perform the same actions (like adding different users to the same group via the API), within a short interval of time. So if we can have deduplication here, it helps us process fewer jobs and read fewer rows from the database.

No changelog added as this method is not currently used anywhere.

Migrations

For: 20220502125053 RecreateIndexForProjectGroupLinkWithGroupIdAndProjectId

Up migration

rake db:migrate:up VERSION=20220502125053

== 20220502125053 RecreateIndexForProjectGroupLinkWithGroupIdAndProjectId: migrating
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:project_group_links, [:group_id, :project_id], {:name=>"index_project_group_links_on_group_id_and_project_id", :algorithm=>:concurrently})
   -> 0.0147s
-- execute("SET statement_timeout TO 0")
   -> 0.0013s
-- add_index(:project_group_links, [:group_id, :project_id], {:name=>"index_project_group_links_on_group_id_and_project_id", :algorithm=>:concurrently})
   -> 0.0195s
-- execute("RESET statement_timeout")
   -> 0.0015s
-- transaction_open?()
   -> 0.0000s
-- indexes(:project_group_links)
   -> 0.0140s
-- remove_index(:project_group_links, {:algorithm=>:concurrently, :name=>"index_project_group_links_on_group_id"})
   -> 0.0151s
== 20220502125053 RecreateIndexForProjectGroupLinkWithGroupIdAndProjectId: migrated (0.1090s)

Down migration

rake db:migrate:down VERSION=20220502125053

== 20220502125053 RecreateIndexForProjectGroupLinkWithGroupIdAndProjectId: reverting
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:project_group_links, [:group_id], {:name=>"index_project_group_links_on_group_id", :algorithm=>:concurrently})
   -> 0.0095s
-- execute("SET statement_timeout TO 0")
   -> 0.0007s
-- add_index(:project_group_links, [:group_id], {:name=>"index_project_group_links_on_group_id", :algorithm=>:concurrently})
   -> 0.0043s
-- execute("RESET statement_timeout")
   -> 0.0008s
-- transaction_open?()
   -> 0.0000s
-- indexes(:project_group_links)
   -> 0.0033s
-- remove_index(:project_group_links, {:algorithm=>:concurrently, :name=>"index_project_group_links_on_group_id_and_project_id"})
   -> 0.0032s
== 20220502125053 RecreateIndexForProjectGroupLinkWithGroupIdAndProjectId: reverted (0.0348s)
For: 20220503073401 RecreateIndexForGroupGroupLinkWithBothGroupIds

Up migration

rake db:migrate:up VERSION=20220503073401

== 20220503073401 RecreateIndexForGroupGroupLinkWithBothGroupIds: migrating ===
-- transaction_open?()
   -> 0.0001s
-- index_exists?(:group_group_links, [:shared_with_group_id, :shared_group_id], {:name=>"index_group_group_links_on_shared_with_group_and_shared_group", :algorithm=>:concurrently})
   -> 0.0151s
-- execute("SET statement_timeout TO 0")
   -> 0.0018s
-- add_index(:group_group_links, [:shared_with_group_id, :shared_group_id], {:name=>"index_group_group_links_on_shared_with_group_and_shared_group", :algorithm=>:concurrently})
   -> 0.0325s
-- execute("RESET statement_timeout")
   -> 0.0028s
-- transaction_open?()
   -> 0.0000s
-- indexes(:group_group_links)
   -> 0.0057s
-- remove_index(:group_group_links, {:algorithm=>:concurrently, :name=>"index_group_group_links_on_shared_with_group_id"})
   -> 0.0137s
== 20220503073401 RecreateIndexForGroupGroupLinkWithBothGroupIds: migrated (0.0976s)

Down migration

rake db:migrate:down VERSION=20220503073401

== 20220503073401 RecreateIndexForGroupGroupLinkWithBothGroupIds: reverting ===
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:group_group_links, [:shared_with_group_id], {:name=>"index_group_group_links_on_shared_with_group_id", :algorithm=>:concurrently})
   -> 0.0086s
-- execute("SET statement_timeout TO 0")
   -> 0.0009s
-- add_index(:group_group_links, [:shared_with_group_id], {:name=>"index_group_group_links_on_shared_with_group_id", :algorithm=>:concurrently})
   -> 0.0055s
-- execute("RESET statement_timeout")
   -> 0.0007s
-- transaction_open?()
   -> 0.0000s
-- indexes(:group_group_links)
   -> 0.0078s
-- remove_index(:group_group_links, {:algorithm=>:concurrently, :name=>"index_group_group_links_on_shared_with_group_and_shared_group"})
   -> 0.0037s
== 20220503073401 RecreateIndexForGroupGroupLinkWithBothGroupIds: reverted (0.0440s)

SQL queries generated

Queries

In staging Rails console, I pasted the following

module Groups
  class ProjectsRequiringAuthorizationsRefresh
    class OnDirectMembershipFinder
      def initialize(group)
        @group = group
      end

      def execute
        project_ids = Set.new

        project_ids.merge(ids_of_projects_in_hierarchy_and_project_shares(@group))
        project_ids.merge(ids_of_projects_in_hierarchy_and_project_shares_of_shared_groups(@group))

        project_ids.to_a
      end

      private

      def ids_of_projects_in_hierarchy_and_project_shares(group)
        project_ids = Set.new

        ids_of_projects_in_hierarchy = group.all_projects.pluck(:id) # rubocop: disable CodeReuse/ActiveRecord
        ids_of_projects_in_project_shares = ids_of_projects_shared_with_self_and_descendant_groups(group)

        project_ids.merge(ids_of_projects_in_hierarchy)
        project_ids.merge(ids_of_projects_in_project_shares)

        project_ids
      end

      def ids_of_projects_shared_with_self_and_descendant_groups(group, batch_size: 50)
        project_ids = Set.new

        group.self_and_descendants_ids.each_slice(batch_size) do |group_ids|
          project_ids.merge(ProjectGroupLink.in_group(group_ids).pluck(:project_id)) # rubocop: disable CodeReuse/ActiveRecord
        end

        project_ids
      end

      def ids_of_projects_in_hierarchy_and_project_shares_of_shared_groups(group, batch_size: 10)
        project_ids = Set.new

        group.shared_groups.each_batch(of: batch_size) do |shared_groups_batch|
          shared_groups_batch.each do |shared_group|
            project_ids.merge(ids_of_projects_in_hierarchy_and_project_shares(shared_group))
          end
        end

        project_ids
      end
    end
  end
end

The groups I chose are

  • Group ID: 6543 (gitlab group on staging)
  • Group ID: 2111045 (a group with many group shares)

Results

For Group ID: 6543
ActiveRecord::Base.logger = Logger.new(STDOUT)

group = Group.find(6543)

events = []
callback = lambda do |*args|  
  event = ActiveSupport::Notifications::Event.new(*args)
  events << event
end


ActiveSupport::Notifications.subscribed(callback, "sql.active_record") do
  Groups::ProjectsRequiringAuthorizationsRefresh::OnDirectMembershipFinder.new(group).execute
end

p events.sum(&:duration) #=> 72ms
Queries generated
SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" IN (SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND (traversal_ids @> ('{6543}')))

SELECT "namespaces"."id" FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND (traversal_ids @> ('{6543}'))

SELECT "project_group_links"."project_id" FROM "project_group_links" WHERE "project_group_links"."group_id" IN (6543, 1986712, 1795239, 2347970, 2347987, 2283222, 2347986, 2347984, 2060508)

SELECT "namespaces"."id" FROM "namespaces" INNER JOIN "group_group_links" ON "namespaces"."id" = "group_group_links"."shared_group_id" WHERE "namespaces"."type" = 'Group' AND "group_group_links"."shared_with_group_id" = 6543 ORDER BY "namespaces"."id" ASC LIMIT 1
For Group ID: 2111045
ActiveRecord::Base.logger = Logger.new(STDOUT)

group = Group.find(2111045)

events = []
callback = lambda do |*args|  
  event = ActiveSupport::Notifications::Event.new(*args)
  events << event
end


ActiveSupport::Notifications.subscribed(callback, "sql.active_record") do
  Groups::ProjectsRequiringAuthorizationsRefresh::OnDirectMembershipFinder.new(group).execute
end

p events.sum(&:duration) #=> 28 ms
Queries generated
SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" IN (SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND (traversal_ids @> ('{2111045}')))

SELECT "namespaces"."id" FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND (traversal_ids @> ('{2111045}'))

SELECT "project_group_links"."project_id" FROM "project_group_links" WHERE "project_group_links"."group_id" IN (2188439, 2111045)

SELECT "namespaces"."id" FROM "namespaces" INNER JOIN "group_group_links" ON "namespaces"."id" = "group_group_links"."shared_group_id" WHERE "namespaces"."type" = 'Group' AND "group_group_links"."shared_with_group_id" = 2111045 ORDER BY "namespaces"."id" ASC LIMIT 1

SELECT "namespaces"."id" FROM "namespaces" INNER JOIN "group_group_links" ON "namespaces"."id" = "group_group_links"."shared_group_id" WHERE "namespaces"."type" = 'Group' AND "group_group_links"."shared_with_group_id" = 2111045 AND "namespaces"."id" >= 2111045 ORDER BY "namespaces"."id" ASC LIMIT 1 OFFSET 10

SELECT "namespaces"."id", "namespaces"."name", "namespaces"."path", "namespaces"."owner_id", "namespaces"."created_at", "namespaces"."updated_at", "namespaces"."type", "namespaces"."description", "namespaces"."avatar", "namespaces"."membership_lock", "namespaces"."share_with_group_lock", "namespaces"."visibility_level", "namespaces"."request_access_enabled", "namespaces"."ldap_sync_status", "namespaces"."ldap_sync_error", "namespaces"."ldap_sync_last_update_at", "namespaces"."ldap_sync_last_successful_update_at", "namespaces"."ldap_sync_last_sync_at", "namespaces"."lfs_enabled", "namespaces"."description_html", "namespaces"."parent_id", "namespaces"."shared_runners_minutes_limit", "namespaces"."repository_size_limit", "namespaces"."require_two_factor_authentication", "namespaces"."two_factor_grace_period", "namespaces"."cached_markdown_version", "namespaces"."project_creation_level", "namespaces"."runners_token", "namespaces"."file_template_project_id", "namespaces"."saml_discovery_token", "namespaces"."runners_token_encrypted", "namespaces"."custom_project_templates_group_id", "namespaces"."auto_devops_enabled", "namespaces"."extra_shared_runners_minutes_limit", "namespaces"."last_ci_minutes_notification_at", "namespaces"."last_ci_minutes_usage_notification_level", "namespaces"."subgroup_creation_level", "namespaces"."emails_disabled", "namespaces"."max_pages_size", "namespaces"."max_artifacts_size", "namespaces"."marked_for_deletion_at", "namespaces"."marked_for_deletion_by_user_id", "namespaces"."mentions_disabled", "namespaces"."default_branch_protection", "namespaces"."unlock_membership_to_ldap", "namespaces"."max_personal_access_token_lifetime", "namespaces"."push_rule_id", "namespaces"."shared_runners_enabled", "namespaces"."allow_descendants_override_disabled_shared_runners", "namespaces"."traversal_ids" FROM "namespaces" INNER JOIN "group_group_links" ON "namespaces"."id" = "group_group_links"."shared_group_id" WHERE "namespaces"."type" = 'Group' AND "group_group_links"."shared_with_group_id" = 2111045 AND "namespaces"."id" >= 2111045

SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" IN (SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND (traversal_ids @> ('{2111045}')))

SELECT "namespaces"."id" FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND (traversal_ids @> ('{2111045}'))

SELECT "project_group_links"."project_id" FROM "project_group_links" WHERE "project_group_links"."group_id" IN (2188439, 2111045)

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to https://gitlab.com/gitlab-org/gitlab/-/issues/357479

Edited by Manoj M J

Merge request reports