Allow selective sync by orgs for CI artifacts
What does this MR do and why?
Previously, the system could only sync all CI artifacts (Job Artifacts, Pipeline Artifacts, or Secure Files) or filter by specific namespaces/groups. Now it properly supports filtering by organizations, giving administrators more flexible options for what data gets synced between sites.
References
- Related to #534157 (closed)
- Related to #534158 (closed)
- Related to #534159 (closed)
How to set up and validate locally
Prerequisites
Click to expand
-
Set up Geo with GDK
- Follow the GDK Geo setup guide to configure a primary and secondary Geo instance
- Enable object storage for Merge Request Diffs
- Ensure both instances are running properly
-
Enable organization features
-
Run these Rails commands on your primary GDK instance in Rails console:
Feature.enable_percentage_of_time(:allow_organization_creation, 100) Feature.enable_percentage_of_time(:organization_switching, 100) Feature.enable_percentage_of_time(:ui_for_organizations, 100)
-
-
Create test organizations
-
Run these Rails commands to create test organizations with projects:
# Create first organization with owner org1 = Organizations::Organization.create!(name: 'Test Org 1', path: 'test-org-1', visibility_level: Organizations::Organization::PUBLIC) Organizations::OrganizationUser.create_organization_record_for(User.first.id, org1.id) # Create second organization with owner org2 = Organizations::Organization.create!(name: 'Test Org 2', path: 'test-org-2', visibility_level: Organizations::Organization::PUBLIC) Organizations::OrganizationUser.create_organization_record_for(User.first.id, org2.id) # Create projects in first organization group1 = Group.create!(name: 'Group 1', path: 'group-1', organization: org1) group1.add_owner(User.first) # Create 3 projects in first organization 3.times do |i| Projects::CreateService.new(User.first, { name: "Project #{i+1}", path: "project-#{i+1}", description: "Test project #{i+1}", namespace_id: group1.id, organization_id: org1.id, visibility_level: Gitlab::VisibilityLevel.level_value('private'), initialize_with_readme: true }).execute end # Create projects in second organization group2 = Group.create!(name: 'Group 2', path: 'group-2', organization: org2) group2.add_owner(User.first) # Create 3 projects in second organization 3.times do |i| Projects::CreateService.new(User.first, { name: "Project #{i+4}", path: "project-#{i+4}", description: "Test project #{i+4}", namespace_id: group2.id, organization_id: org2.id, visibility_level: Gitlab::VisibilityLevel.level_value('private'), initialize_with_readme: true }).execute end puts 'Created 2 organizations with 3 projects each'
-
-
Create external merge request diffs
- Use these rake tasks to seed some CI artifacts for the projects created above.
-
Create a personal access token
- Follow the personal access token documentation
- Make sure to select the
apiandadmin_modescopes - Save the token for use in the API requests
Primary Site Selective Checksumming by Organizations - Testing Steps
Click to expand
-
In the primary GDK site:
gdk switch 534161-org-mover-implement-selective-sync-scope-for-mergerequestdiff -
In the secondary GDK site:
gdk switch 534161-org-mover-implement-selective-sync-scope-for-mergerequestdiff -
In the primary GDK site, open Rails console:
bin/rails c -
Enable the FF:
Feature.enable(:org_mover_extend_selective_sync_to_primary_checksumming) -
Enable the FF:
Feature.enable(:geo_selective_sync_by_organizations) -
Get your current configuration:
# Get your personal access token export PRIVATE_TOKEN="your_personal_access_token" # List all Geo sites to get the site ID curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites" | jq # Store the site ID of the primary node export SITE_ID=1 # Replace with your primary site ID # Output organization objects for their IDs bin/rails runner "pp Organizations::Organization.all" # Store an organization ID for testing export ORG_ID=1003 # Replace with your organization ID -
Enable selective checksumming by organization:
# Enable selective checksumming by organization and select the specific organization curl --request PUT \ --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" \ --header "Content-Type: application/json" \ --data '{ "selective_sync_type": "organizations", "selective_sync_organization_ids": ['$ORG_ID'] }' \ "http://localhost:3000/api/v4/geo_sites/$SITE_ID" -
Verify the configuration:
# Get the updated Geo site configuration and confirm that selective_sync_type is "organizations" and # organization_ids contains your organization ID curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID" | jq -
Wait a few minutes and verify the secondary site status
# Get the updated site status and confirm that ci_secure_files_checksummed_count, job_artifacts_checksummed_count and pipeline_artifacts_checksummed_count # matches the number of dependency proxy manifests that belong to your organization ID curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID/status" | jq -
Test with multiple organizations:
export ORG_ID2=1004 # Replace with another organization ID # Update to include multiple organizations curl --request PUT \ --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" \ --header "Content-Type: application/json" \ --data '{ "selective_sync_type": "organizations", "selective_sync_organization_ids": ['$ORG_ID','$ORG_ID2'] }' \ "http://localhost:3000/api/v4/geo_sites/$SITE_ID" # Verify the update curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID" | jq # Wait a few minutes and verify the Geo site status curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID/status" | jq -
Disable selective sync:
# Reset back to no selective sync curl --request PUT \ --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" \ --header "Content-Type: application/json" \ --data '{ "selective_sync_type": "" }' \ "http://localhost:3000/api/v4/geo_sites/$SITE_ID" # Verify the update curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID" # Wait a few minutes and verify the Geo site status curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID/status" | jq # In the primary GDK site, disable the FF: bin/rails runner "pp Feature.disable(:geo_selective_sync_by_organizations)"
Secondary Site Selective Sync by Organizations - Testing Steps
Click to expand
-
In the primary GDK site:
gdk switch 534161-org-mover-implement-selective-sync-scope-for-mergerequestdiff -
In the secondary GDK site:
gdk switch 534161-org-mover-implement-selective-sync-scope-for-mergerequestdiff -
In the primary GDK site, open Rails console:
bin/rails c -
Enable the FF:
Feature.enable(:geo_selective_sync_by_organizations) -
Get your current configuration:
# Get your personal access token export PRIVATE_TOKEN="your_personal_access_token" # List all Geo sites to get the site ID curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites" | jq # Store the site ID of the secondary node export SITE_ID=2 # Replace with your secondary site ID # Output organization objects for their IDs bin/rails runner "pp Organizations::Organization.all" # Store an organization ID for testing export ORG_ID=1003 # Replace with your organization ID -
Enable selective sync by organization:
# Enable selective sync by organization and select the specific organization curl --request PUT \ --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" \ --header "Content-Type: application/json" \ --data '{ "selective_sync_type": "organizations", "selective_sync_organization_ids": ['$ORG_ID'] }' \ "http://localhost:3000/api/v4/geo_sites/$SITE_ID" -
Verify the configuration:
# Get the updated Geo site configuration and confirm that selective_sync_type is "organizations" and # organization_ids contains your organization ID curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID" | jq -
Wait a few minutes and verify the secondary site status
# Get the updated site status and confirm that ci_secure_files_checksummed_count, job_artifacts_checksummed_count and pipeline_artifacts_checksummed_count # matches the number of dependency proxy manifests that belong to your organization ID curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID/status" | jq -
Test with multiple organizations:
export ORG_ID2=1004 # Replace with another organization ID # Update to include multiple organizations curl --request PUT \ --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" \ --header "Content-Type: application/json" \ --data '{ "selective_sync_type": "organizations", "selective_sync_organization_ids": ['$ORG_ID','$ORG_ID2'] }' \ "http://localhost:3000/api/v4/geo_sites/$SITE_ID" # Verify the update curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID" | jq # Wait a few minutes and verify the Geo site status curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID/status" | jq -
Disable selective sync:
# Reset back to no selective sync curl --request PUT \ --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" \ --header "Content-Type: application/json" \ --data '{ "selective_sync_type": "" }' \ "http://localhost:3000/api/v4/geo_sites/$SITE_ID" # Verify the update curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID" # Wait a few minutes and verify the Geo site status curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID/status" | jq # In the primary GDK site, disable the FF: bin/rails runner "pp Feature.disable(:org_mover_extend_selective_sync_to_primary_checksumming)" bin/rails runner "pp Feature.disable(:geo_selective_sync_by_organizations)"
Database Queries
-
Ci::JobArtifact.replicables_for_current_secondary(1..10000)-
Raw SQL 1
Click to expand
SELECT DISTINCT "p_ci_job_artifacts"."project_id" FROM "p_ci_job_artifacts" WHERE "p_ci_job_artifacts"."id" BETWEEN 1 AND 10000; -
Query Plan 1: https://explain.depesz.com/s/PHyN
-
Raw SQL 2
Click to expand
SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" IN ( SELECT "namespaces"."id" FROM "namespaces" WHERE "namespaces"."organization_id" IN ( SELECT "geo_node_organization_links"."organization_id" FROM "geo_node_organization_links" WHERE "geo_node_organization_links"."geo_node_id" = 2)) AND "projects"."id" IN (25914, 27074, 25755, 9117, 3265, 31942, 31877, 20315, 36455, 20705, 18352, 36139, 23833, 9441, 30946, 6403, 32129, 25620, 1673, 34311, 37021, 32956, 11645, 23528, 27539, 22377, 32565, 6306, 8472, 29199, 34348, 9430, 6068, 26170, 11212, 27927, 36979, 30383, 20739, 8844, 3193, 34292, 8792, 14278, 25963, 37289, 18591, 28238, 18321, 9747, 32577, 12552, 23532, 34843, 13156, 27719, 30784, 19739, 13365, 7434, 25000, 27313, 1988, 35988, 1232, 20060, 23575, 11502, 4, 25671, 9011, 28064, 14689, 30776, 19791, 26732, 13511, 1849, 6, 537, 15405, 32322, 32521, 7236, 6827, 14563, 29972, 30408, 13980, 20296, 16666, 34701, 2, 9846, 1714, 21093, 595, 14246, 16865, 23650, 29371, 9598, 9305, 1083, 7531, 7, 2742, 31992, 18898, 33978, 9780, 7579, 27593, 7009, 4720, 11558, 14539, 19094, 9152, 24061, 25094, 22435, 3104, 23824, 5276, 36148, 17870, 27589, 1, 28344, 14444, 20678, 20316, 30020, 2658, 18902, 27736, 33191, 13787, 3903, 30823, 3417, 13898, 28123, 19882, 33675, 6858, 18938, 438, 14899, 25867, 3449, 8546, 10568, 25878, 2346, 19287, 35062, 10182, 20211, 26842, 17045, 14084, 8294, 18165, 10125, 23578, 22653, 27969, 4747, 34911, 21179, 1400, 36061, 31218, 3593, 11480, 964, 24459, 24915, 20403, 1987, 10540, 34334, 6510, 25672, 13825, 10437, 19750, 11772, 14742, 13891, 33743, 27755, 31201, 2788, 22257, 19185, 6361, 15107, 840, 17663, 36669, 7788, 3, 13456, 14965, 1116, 19401, 8118, 17479, 29293, 29213, 9458, 10470, 7743, 9387, 32529, 10584, 10107, 8858, 17372, 13537, 29379, 3431, 11901, 17201, 27573, 9119, 32027, 35360, 10544, 27852, 1569, 36373, 683, 11766, 23522, 34669, 13830, 30741, 24889, 24550, 34665, 2163, 10288, 1735, 16092, 647, 33898, 2426, 25668, 25947, 13583, 19533, 30296, 32270, 10111, 34799, 23257, 5258, 4030, 30003, 28668, 16704, 13259, 4868, 32250, 8518, 36585, 264, 9061, 13522, 20260, 2954, 5348, 36446, 13861); -
Query Plan 2: https://explain.depesz.com/s/85mM
-
Raw SQL 3
Click to expand
SELECT "p_ci_job_artifacts".* FROM "p_ci_job_artifacts" WHERE "p_ci_job_artifacts"."id" BETWEEN 1 AND 10000 AND "p_ci_job_artifacts"."project_id" IN (4030, 13456, 13861, 19185, 20296, 20678, 23824, 25914, 27539, 30776, 31201, 34701); -
Query Plan 3: https://explain.depesz.com/s/000G
-
-
Ci::JobArtifact.pluck_verifiable_ids_in_range(1..10000)-
Raw SQL 1
Click to expand
SELECT DISTINCT "p_ci_job_artifacts"."project_id" FROM "p_ci_job_artifacts" WHERE "p_ci_job_artifacts"."id" BETWEEN 1 AND 10000; -
Query Plan 1: https://explain.depesz.com/s/PHyN
-
Raw SQL 2
Click to expand
SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" IN ( SELECT "namespaces"."id" FROM "namespaces" WHERE "namespaces"."organization_id" IN ( SELECT "geo_node_organization_links"."organization_id" FROM "geo_node_organization_links" WHERE "geo_node_organization_links"."geo_node_id" = 2)) AND "projects"."id" IN (25914, 27074, 25755, 9117, 3265, 31942, 31877, 20315, 36455, 20705, 18352, 36139, 23833, 9441, 30946, 6403, 32129, 25620, 1673, 34311, 37021, 32956, 11645, 23528, 27539, 22377, 32565, 6306, 8472, 29199, 34348, 9430, 6068, 26170, 11212, 27927, 36979, 30383, 20739, 8844, 3193, 34292, 8792, 14278, 25963, 37289, 18591, 28238, 18321, 9747, 32577, 12552, 23532, 34843, 13156, 27719, 30784, 19739, 13365, 7434, 25000, 27313, 1988, 35988, 1232, 20060, 23575, 11502, 4, 25671, 9011, 28064, 14689, 30776, 19791, 26732, 13511, 1849, 6, 537, 15405, 32322, 32521, 7236, 6827, 14563, 29972, 30408, 13980, 20296, 16666, 34701, 2, 9846, 1714, 21093, 595, 14246, 16865, 23650, 29371, 9598, 9305, 1083, 7531, 7, 2742, 31992, 18898, 33978, 9780, 7579, 27593, 7009, 4720, 11558, 14539, 19094, 9152, 24061, 25094, 22435, 3104, 23824, 5276, 36148, 17870, 27589, 1, 28344, 14444, 20678, 20316, 30020, 2658, 18902, 27736, 33191, 13787, 3903, 30823, 3417, 13898, 28123, 19882, 33675, 6858, 18938, 438, 14899, 25867, 3449, 8546, 10568, 25878, 2346, 19287, 35062, 10182, 20211, 26842, 17045, 14084, 8294, 18165, 10125, 23578, 22653, 27969, 4747, 34911, 21179, 1400, 36061, 31218, 3593, 11480, 964, 24459, 24915, 20403, 1987, 10540, 34334, 6510, 25672, 13825, 10437, 19750, 11772, 14742, 13891, 33743, 27755, 31201, 2788, 22257, 19185, 6361, 15107, 840, 17663, 36669, 7788, 3, 13456, 14965, 1116, 19401, 8118, 17479, 29293, 29213, 9458, 10470, 7743, 9387, 32529, 10584, 10107, 8858, 17372, 13537, 29379, 3431, 11901, 17201, 27573, 9119, 32027, 35360, 10544, 27852, 1569, 36373, 683, 11766, 23522, 34669, 13830, 30741, 24889, 24550, 34665, 2163, 10288, 1735, 16092, 647, 33898, 2426, 25668, 25947, 13583, 19533, 30296, 32270, 10111, 34799, 23257, 5258, 4030, 30003, 28668, 16704, 13259, 4868, 32250, 8518, 36585, 264, 9061, 13522, 20260, 2954, 5348, 36446, 13861); -
Query Plan 2: https://explain.depesz.com/s/85mM
-
Raw SQL 3
Click to expand
SELECT "p_ci_job_artifacts".* FROM "p_ci_job_artifacts" WHERE "p_ci_job_artifacts"."id" BETWEEN 1 AND 10000 AND "p_ci_job_artifacts"."project_id" IN (4030, 13456, 13861, 19185, 20296, 20678, 23824, 25914, 27539, 30776, 31201, 34701); -
Raw SQL 3
Click to expand
SELECT "p_ci_job_artifacts"."id" FROM "p_ci_job_artifacts" WHERE "p_ci_job_artifacts"."id" BETWEEN 1 AND 10000 AND "p_ci_job_artifacts"."project_id" IN (4030, 13456, 13861, 19185, 20296, 20678, 23824, 25914, 27539, 30776, 31201, 34701);- Query Plan 3: https://explain.depesz.com/s/RC1m
-
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.