Resolve "Provide a method to gather IDs of all projects that are going to be affected, when membership changes are made in a particular group"
What does this MR do and why?
This is the first step towards implementing the idea we came up with in https://gitlab.com/gitlab-org/gitlab/-/issues/357479#note_924149652
With this newly introduced Groups::ProjectsRequiringAuthorizationsRefresh::OnDirectMembershipFinder.new(group).execute
method, we now have the ability to get a list
of IDs of all projects whose project_authroizations
records needs to be updated when a user is added/removed/updated in a group.
Previously, there was no means to gather this list of projects in one go, and hence we used to refresh all of the project_authroizations
records across all of the GitLab instance for that specific user. This is expensive because
-
we are reading unnecessary data - ie, if a user is added to Group
A
, we do not need to update the authorizations of that user in say, projects within their personal namespace or projects within an unrelated GroupB
. Hence reading such data would ultimately go to waste. -
we cannot perform efficient deduplication - imagine when hundreds of members are being constantly added to the same group. If we schedule one job each for each of these users, the
user_id
of each user is unique, so we cannot have any deduplication. However, instead of enqueingAuthorizedProjectsWorker.perform_async(user.id)
for each of these users, if we enqueue recalculation jobs for each of the affected projects within the group, ieAuthorizedProjectUpdate::ProjectRecalculateWorker.perform_async(project.id)
, we can have effective deduplication because theproject.id
s does not change. Many automations of customers usually perform the same actions (like adding different users to the same group via the API), within a short interval of time. So if we can have deduplication here, it helps us process fewer jobs and read fewer rows from the database.
No changelog added as this method is not currently used anywhere.
Migrations
For: 20220502125053 RecreateIndexForProjectGroupLinkWithGroupIdAndProjectId
Up migration
rake db:migrate:up VERSION=20220502125053
== 20220502125053 RecreateIndexForProjectGroupLinkWithGroupIdAndProjectId: migrating
-- transaction_open?()
-> 0.0000s
-- index_exists?(:project_group_links, [:group_id, :project_id], {:name=>"index_project_group_links_on_group_id_and_project_id", :algorithm=>:concurrently})
-> 0.0147s
-- execute("SET statement_timeout TO 0")
-> 0.0013s
-- add_index(:project_group_links, [:group_id, :project_id], {:name=>"index_project_group_links_on_group_id_and_project_id", :algorithm=>:concurrently})
-> 0.0195s
-- execute("RESET statement_timeout")
-> 0.0015s
-- transaction_open?()
-> 0.0000s
-- indexes(:project_group_links)
-> 0.0140s
-- remove_index(:project_group_links, {:algorithm=>:concurrently, :name=>"index_project_group_links_on_group_id"})
-> 0.0151s
== 20220502125053 RecreateIndexForProjectGroupLinkWithGroupIdAndProjectId: migrated (0.1090s)
Down migration
rake db:migrate:down VERSION=20220502125053
== 20220502125053 RecreateIndexForProjectGroupLinkWithGroupIdAndProjectId: reverting
-- transaction_open?()
-> 0.0000s
-- index_exists?(:project_group_links, [:group_id], {:name=>"index_project_group_links_on_group_id", :algorithm=>:concurrently})
-> 0.0095s
-- execute("SET statement_timeout TO 0")
-> 0.0007s
-- add_index(:project_group_links, [:group_id], {:name=>"index_project_group_links_on_group_id", :algorithm=>:concurrently})
-> 0.0043s
-- execute("RESET statement_timeout")
-> 0.0008s
-- transaction_open?()
-> 0.0000s
-- indexes(:project_group_links)
-> 0.0033s
-- remove_index(:project_group_links, {:algorithm=>:concurrently, :name=>"index_project_group_links_on_group_id_and_project_id"})
-> 0.0032s
== 20220502125053 RecreateIndexForProjectGroupLinkWithGroupIdAndProjectId: reverted (0.0348s)
For: 20220503073401 RecreateIndexForGroupGroupLinkWithBothGroupIds
Up migration
rake db:migrate:up VERSION=20220503073401
== 20220503073401 RecreateIndexForGroupGroupLinkWithBothGroupIds: migrating ===
-- transaction_open?()
-> 0.0001s
-- index_exists?(:group_group_links, [:shared_with_group_id, :shared_group_id], {:name=>"index_group_group_links_on_shared_with_group_and_shared_group", :algorithm=>:concurrently})
-> 0.0151s
-- execute("SET statement_timeout TO 0")
-> 0.0018s
-- add_index(:group_group_links, [:shared_with_group_id, :shared_group_id], {:name=>"index_group_group_links_on_shared_with_group_and_shared_group", :algorithm=>:concurrently})
-> 0.0325s
-- execute("RESET statement_timeout")
-> 0.0028s
-- transaction_open?()
-> 0.0000s
-- indexes(:group_group_links)
-> 0.0057s
-- remove_index(:group_group_links, {:algorithm=>:concurrently, :name=>"index_group_group_links_on_shared_with_group_id"})
-> 0.0137s
== 20220503073401 RecreateIndexForGroupGroupLinkWithBothGroupIds: migrated (0.0976s)
Down migration
rake db:migrate:down VERSION=20220503073401
== 20220503073401 RecreateIndexForGroupGroupLinkWithBothGroupIds: reverting ===
-- transaction_open?()
-> 0.0000s
-- index_exists?(:group_group_links, [:shared_with_group_id], {:name=>"index_group_group_links_on_shared_with_group_id", :algorithm=>:concurrently})
-> 0.0086s
-- execute("SET statement_timeout TO 0")
-> 0.0009s
-- add_index(:group_group_links, [:shared_with_group_id], {:name=>"index_group_group_links_on_shared_with_group_id", :algorithm=>:concurrently})
-> 0.0055s
-- execute("RESET statement_timeout")
-> 0.0007s
-- transaction_open?()
-> 0.0000s
-- indexes(:group_group_links)
-> 0.0078s
-- remove_index(:group_group_links, {:algorithm=>:concurrently, :name=>"index_group_group_links_on_shared_with_group_and_shared_group"})
-> 0.0037s
== 20220503073401 RecreateIndexForGroupGroupLinkWithBothGroupIds: reverted (0.0440s)
SQL queries generated
Queries
In staging Rails console, I pasted the following
module Groups
class ProjectsRequiringAuthorizationsRefresh
class OnDirectMembershipFinder
def initialize(group)
@group = group
end
def execute
project_ids = Set.new
project_ids.merge(ids_of_projects_in_hierarchy_and_project_shares(@group))
project_ids.merge(ids_of_projects_in_hierarchy_and_project_shares_of_shared_groups(@group))
project_ids.to_a
end
private
def ids_of_projects_in_hierarchy_and_project_shares(group)
project_ids = Set.new
ids_of_projects_in_hierarchy = group.all_projects.pluck(:id) # rubocop: disable CodeReuse/ActiveRecord
ids_of_projects_in_project_shares = ids_of_projects_shared_with_self_and_descendant_groups(group)
project_ids.merge(ids_of_projects_in_hierarchy)
project_ids.merge(ids_of_projects_in_project_shares)
project_ids
end
def ids_of_projects_shared_with_self_and_descendant_groups(group, batch_size: 50)
project_ids = Set.new
group.self_and_descendants_ids.each_slice(batch_size) do |group_ids|
project_ids.merge(ProjectGroupLink.in_group(group_ids).pluck(:project_id)) # rubocop: disable CodeReuse/ActiveRecord
end
project_ids
end
def ids_of_projects_in_hierarchy_and_project_shares_of_shared_groups(group, batch_size: 10)
project_ids = Set.new
group.shared_groups.each_batch(of: batch_size) do |shared_groups_batch|
shared_groups_batch.each do |shared_group|
project_ids.merge(ids_of_projects_in_hierarchy_and_project_shares(shared_group))
end
end
project_ids
end
end
end
end
The groups I chose are
- Group ID:
6543
(gitlab group on staging) - Group ID:
2111045
(a group with many group shares)
Results
6543
For Group ID: ActiveRecord::Base.logger = Logger.new(STDOUT)
group = Group.find(6543)
events = []
callback = lambda do |*args|
event = ActiveSupport::Notifications::Event.new(*args)
events << event
end
ActiveSupport::Notifications.subscribed(callback, "sql.active_record") do
Groups::ProjectsRequiringAuthorizationsRefresh::OnDirectMembershipFinder.new(group).execute
end
p events.sum(&:duration) #=> 72ms
Queries generated
SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" IN (SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND (traversal_ids @> ('{6543}')))
SELECT "namespaces"."id" FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND (traversal_ids @> ('{6543}'))
SELECT "project_group_links"."project_id" FROM "project_group_links" WHERE "project_group_links"."group_id" IN (6543, 1986712, 1795239, 2347970, 2347987, 2283222, 2347986, 2347984, 2060508)
SELECT "namespaces"."id" FROM "namespaces" INNER JOIN "group_group_links" ON "namespaces"."id" = "group_group_links"."shared_group_id" WHERE "namespaces"."type" = 'Group' AND "group_group_links"."shared_with_group_id" = 6543 ORDER BY "namespaces"."id" ASC LIMIT 1
2111045
For Group ID: ActiveRecord::Base.logger = Logger.new(STDOUT)
group = Group.find(2111045)
events = []
callback = lambda do |*args|
event = ActiveSupport::Notifications::Event.new(*args)
events << event
end
ActiveSupport::Notifications.subscribed(callback, "sql.active_record") do
Groups::ProjectsRequiringAuthorizationsRefresh::OnDirectMembershipFinder.new(group).execute
end
p events.sum(&:duration) #=> 28 ms
Queries generated
SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" IN (SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND (traversal_ids @> ('{2111045}')))
SELECT "namespaces"."id" FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND (traversal_ids @> ('{2111045}'))
SELECT "project_group_links"."project_id" FROM "project_group_links" WHERE "project_group_links"."group_id" IN (2188439, 2111045)
SELECT "namespaces"."id" FROM "namespaces" INNER JOIN "group_group_links" ON "namespaces"."id" = "group_group_links"."shared_group_id" WHERE "namespaces"."type" = 'Group' AND "group_group_links"."shared_with_group_id" = 2111045 ORDER BY "namespaces"."id" ASC LIMIT 1
SELECT "namespaces"."id" FROM "namespaces" INNER JOIN "group_group_links" ON "namespaces"."id" = "group_group_links"."shared_group_id" WHERE "namespaces"."type" = 'Group' AND "group_group_links"."shared_with_group_id" = 2111045 AND "namespaces"."id" >= 2111045 ORDER BY "namespaces"."id" ASC LIMIT 1 OFFSET 10
SELECT "namespaces"."id", "namespaces"."name", "namespaces"."path", "namespaces"."owner_id", "namespaces"."created_at", "namespaces"."updated_at", "namespaces"."type", "namespaces"."description", "namespaces"."avatar", "namespaces"."membership_lock", "namespaces"."share_with_group_lock", "namespaces"."visibility_level", "namespaces"."request_access_enabled", "namespaces"."ldap_sync_status", "namespaces"."ldap_sync_error", "namespaces"."ldap_sync_last_update_at", "namespaces"."ldap_sync_last_successful_update_at", "namespaces"."ldap_sync_last_sync_at", "namespaces"."lfs_enabled", "namespaces"."description_html", "namespaces"."parent_id", "namespaces"."shared_runners_minutes_limit", "namespaces"."repository_size_limit", "namespaces"."require_two_factor_authentication", "namespaces"."two_factor_grace_period", "namespaces"."cached_markdown_version", "namespaces"."project_creation_level", "namespaces"."runners_token", "namespaces"."file_template_project_id", "namespaces"."saml_discovery_token", "namespaces"."runners_token_encrypted", "namespaces"."custom_project_templates_group_id", "namespaces"."auto_devops_enabled", "namespaces"."extra_shared_runners_minutes_limit", "namespaces"."last_ci_minutes_notification_at", "namespaces"."last_ci_minutes_usage_notification_level", "namespaces"."subgroup_creation_level", "namespaces"."emails_disabled", "namespaces"."max_pages_size", "namespaces"."max_artifacts_size", "namespaces"."marked_for_deletion_at", "namespaces"."marked_for_deletion_by_user_id", "namespaces"."mentions_disabled", "namespaces"."default_branch_protection", "namespaces"."unlock_membership_to_ldap", "namespaces"."max_personal_access_token_lifetime", "namespaces"."push_rule_id", "namespaces"."shared_runners_enabled", "namespaces"."allow_descendants_override_disabled_shared_runners", "namespaces"."traversal_ids" FROM "namespaces" INNER JOIN "group_group_links" ON "namespaces"."id" = "group_group_links"."shared_group_id" WHERE "namespaces"."type" = 'Group' AND "group_group_links"."shared_with_group_id" = 2111045 AND "namespaces"."id" >= 2111045
SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" IN (SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND (traversal_ids @> ('{2111045}')))
SELECT "namespaces"."id" FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND (traversal_ids @> ('{2111045}'))
SELECT "project_group_links"."project_id" FROM "project_group_links" WHERE "project_group_links"."group_id" IN (2188439, 2111045)
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to https://gitlab.com/gitlab-org/gitlab/-/issues/357479