Allow selective sync by orgs for CI artifacts

What does this MR do and why?

Previously, the system could only sync all CI artifacts (Job Artifacts, Pipeline Artifacts, or Secure Files) or filter by specific namespaces/groups. Now it properly supports filtering by organizations, giving administrators more flexible options for what data gets synced between sites.

References

How to set up and validate locally

Prerequisites

Click to expand
  1. Set up Geo with GDK

  2. Enable organization features

    • Run these Rails commands on your primary GDK instance in Rails console:

      Feature.enable_percentage_of_time(:allow_organization_creation, 100)
      Feature.enable_percentage_of_time(:organization_switching, 100)
      Feature.enable_percentage_of_time(:ui_for_organizations, 100)
  3. Create test organizations

    • Run these Rails commands to create test organizations with projects:

      # Create first organization with owner
      org1 = Organizations::Organization.create!(name: 'Test Org 1', path: 'test-org-1', visibility_level: Organizations::Organization::PUBLIC)
      Organizations::OrganizationUser.create_organization_record_for(User.first.id, org1.id)
      
      # Create second organization with owner
      org2 = Organizations::Organization.create!(name: 'Test Org 2', path: 'test-org-2', visibility_level: Organizations::Organization::PUBLIC)
      Organizations::OrganizationUser.create_organization_record_for(User.first.id, org2.id)
      
      # Create projects in first organization
      group1 = Group.create!(name: 'Group 1', path: 'group-1', organization: org1)
      group1.add_owner(User.first)
      
      # Create 3 projects in first organization
      3.times do |i|
      Projects::CreateService.new(User.first, {
          name: "Project #{i+1}",
          path: "project-#{i+1}",
          description: "Test project #{i+1}",
          namespace_id: group1.id,
          organization_id: org1.id,
          visibility_level: Gitlab::VisibilityLevel.level_value('private'),
          initialize_with_readme: true
        }).execute
      end
      
      # Create projects in second organization
      group2 = Group.create!(name: 'Group 2', path: 'group-2', organization: org2)
      group2.add_owner(User.first)
      
      # Create 3 projects in second organization
      3.times do |i|
      Projects::CreateService.new(User.first, {
          name: "Project #{i+4}",
          path: "project-#{i+4}",
          description: "Test project #{i+4}",
          namespace_id: group2.id,
          organization_id: org2.id,
          visibility_level: Gitlab::VisibilityLevel.level_value('private'),
          initialize_with_readme: true
        }).execute
      end
      
      puts 'Created 2 organizations with 3 projects each'
  4. Create external merge request diffs

  • Use these rake tasks to seed some CI artifacts for the projects created above.
  1. Create a personal access token

Primary Site Selective Checksumming by Organizations - Testing Steps

Click to expand
  1. In the primary GDK site: gdk switch 534161-org-mover-implement-selective-sync-scope-for-mergerequestdiff

  2. In the secondary GDK site: gdk switch 534161-org-mover-implement-selective-sync-scope-for-mergerequestdiff

  3. In the primary GDK site, open Rails console: bin/rails c

  4. Enable the FF: Feature.enable(:org_mover_extend_selective_sync_to_primary_checksumming)

  5. Enable the FF: Feature.enable(:geo_selective_sync_by_organizations)

  6. Get your current configuration:

    # Get your personal access token
    export PRIVATE_TOKEN="your_personal_access_token"
    
    # List all Geo sites to get the site ID
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites" | jq
    
    # Store the site ID of the primary node
    export SITE_ID=1  # Replace with your primary site ID
    
    # Output organization objects for their IDs
    bin/rails runner "pp Organizations::Organization.all"
    
    # Store an organization ID for testing
    export ORG_ID=1003  # Replace with your organization ID
  7. Enable selective checksumming by organization:

    # Enable selective checksumming by organization and select the specific organization
    curl --request PUT \
      --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" \
      --header "Content-Type: application/json" \
      --data '{
        "selective_sync_type": "organizations",
        "selective_sync_organization_ids": ['$ORG_ID']
      }' \
      "http://localhost:3000/api/v4/geo_sites/$SITE_ID"
  8. Verify the configuration:

    # Get the updated Geo site configuration and confirm that selective_sync_type is "organizations" and 
    # organization_ids contains your organization ID
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID" | jq
  9. Wait a few minutes and verify the secondary site status

    # Get the updated site status and confirm that ci_secure_files_checksummed_count, job_artifacts_checksummed_count and pipeline_artifacts_checksummed_count 
    # matches the number of dependency proxy manifests that belong to your organization ID
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID/status" | jq
  10. Test with multiple organizations:

    export ORG_ID2=1004  # Replace with another organization ID
    
    # Update to include multiple organizations
    curl --request PUT \
      --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" \
      --header "Content-Type: application/json" \
      --data '{
        "selective_sync_type": "organizations",
        "selective_sync_organization_ids": ['$ORG_ID','$ORG_ID2']
      }' \
      "http://localhost:3000/api/v4/geo_sites/$SITE_ID"
    
    # Verify the update
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID" | jq
    
    # Wait a few minutes and verify the Geo site status
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID/status" | jq
  11. Disable selective sync:

    # Reset back to no selective sync
    curl --request PUT \
      --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" \
      --header "Content-Type: application/json" \
      --data '{
        "selective_sync_type": ""
      }' \
      "http://localhost:3000/api/v4/geo_sites/$SITE_ID"
    
    # Verify the update
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID"
    
    # Wait a few minutes and verify the Geo site status
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID/status" | jq
    
    # In the primary GDK site, disable the FF: 
    bin/rails runner "pp Feature.disable(:geo_selective_sync_by_organizations)"

Secondary Site Selective Sync by Organizations - Testing Steps

Click to expand
  1. In the primary GDK site: gdk switch 534161-org-mover-implement-selective-sync-scope-for-mergerequestdiff

  2. In the secondary GDK site: gdk switch 534161-org-mover-implement-selective-sync-scope-for-mergerequestdiff

  3. In the primary GDK site, open Rails console: bin/rails c

  4. Enable the FF: Feature.enable(:geo_selective_sync_by_organizations)

  5. Get your current configuration:

    # Get your personal access token
    export PRIVATE_TOKEN="your_personal_access_token"
    
    # List all Geo sites to get the site ID
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites" | jq
    
    # Store the site ID of the secondary node
    export SITE_ID=2  # Replace with your secondary site ID
    
    # Output organization objects for their IDs
    bin/rails runner "pp Organizations::Organization.all"
    
    # Store an organization ID for testing
    export ORG_ID=1003  # Replace with your organization ID
  6. Enable selective sync by organization:

    # Enable selective sync by organization and select the specific organization
    curl --request PUT \
      --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" \
      --header "Content-Type: application/json" \
      --data '{
        "selective_sync_type": "organizations",
        "selective_sync_organization_ids": ['$ORG_ID']
      }' \
      "http://localhost:3000/api/v4/geo_sites/$SITE_ID"
  7. Verify the configuration:

    # Get the updated Geo site configuration and confirm that selective_sync_type is "organizations" and 
    # organization_ids contains your organization ID
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID" | jq
  8. Wait a few minutes and verify the secondary site status

    # Get the updated site status and confirm that ci_secure_files_checksummed_count, job_artifacts_checksummed_count and pipeline_artifacts_checksummed_count 
    # matches the number of dependency proxy manifests that belong to your organization ID
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID/status" | jq
  9. Test with multiple organizations:

    export ORG_ID2=1004  # Replace with another organization ID
    
    # Update to include multiple organizations
    curl --request PUT \
      --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" \
      --header "Content-Type: application/json" \
      --data '{
        "selective_sync_type": "organizations",
        "selective_sync_organization_ids": ['$ORG_ID','$ORG_ID2']
      }' \
      "http://localhost:3000/api/v4/geo_sites/$SITE_ID"
    
    # Verify the update
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID" | jq
    
    # Wait a few minutes and verify the Geo site status
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID/status" | jq
  10. Disable selective sync:

    # Reset back to no selective sync
    curl --request PUT \
      --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" \
      --header "Content-Type: application/json" \
      --data '{
        "selective_sync_type": ""
      }' \
      "http://localhost:3000/api/v4/geo_sites/$SITE_ID"
    
    # Verify the update
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID"
    
    # Wait a few minutes and verify the Geo site status
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID/status" | jq
    
    # In the primary GDK site, disable the FF: 
    bin/rails runner "pp Feature.disable(:org_mover_extend_selective_sync_to_primary_checksumming)"
    bin/rails runner "pp Feature.disable(:geo_selective_sync_by_organizations)"

Database Queries

  • Ci::JobArtifact.replicables_for_current_secondary(1..10000)

    • Raw SQL 1

      Click to expand
      SELECT DISTINCT
          "p_ci_job_artifacts"."project_id"
      FROM
          "p_ci_job_artifacts"
      WHERE
          "p_ci_job_artifacts"."id" BETWEEN 1 AND 10000;
    • Query Plan 1: https://explain.depesz.com/s/PHyN

    • Raw SQL 2

      Click to expand
      SELECT
          "projects"."id"
      FROM
          "projects"
      WHERE
          "projects"."namespace_id" IN (
              SELECT
                  "namespaces"."id"
              FROM
                  "namespaces"
              WHERE
                  "namespaces"."organization_id" IN (
                      SELECT
                          "geo_node_organization_links"."organization_id"
                      FROM
                          "geo_node_organization_links"
                      WHERE
                          "geo_node_organization_links"."geo_node_id" = 2))
              AND "projects"."id" IN (25914, 27074, 25755, 9117, 3265, 31942, 31877, 20315, 36455, 20705, 18352, 36139, 23833, 9441, 30946, 6403, 32129, 25620, 1673, 34311, 37021, 32956, 11645, 23528, 27539, 22377, 32565, 6306, 8472, 29199, 34348, 9430, 6068, 26170, 11212, 27927, 36979, 30383, 20739, 8844, 3193, 34292, 8792, 14278, 25963, 37289, 18591, 28238, 18321, 9747, 32577, 12552, 23532, 34843, 13156, 27719, 30784, 19739, 13365, 7434, 25000, 27313, 1988, 35988, 1232, 20060, 23575, 11502, 4, 25671, 9011, 28064, 14689, 30776, 19791, 26732, 13511, 1849, 6, 537, 15405, 32322, 32521, 7236, 6827, 14563, 29972, 30408, 13980, 20296, 16666, 34701, 2, 9846, 1714, 21093, 595, 14246, 16865, 23650, 29371, 9598, 9305, 1083, 7531, 7, 2742, 31992, 18898, 33978, 9780, 7579, 27593, 7009, 4720, 11558, 14539, 19094, 9152, 24061, 25094, 22435, 3104, 23824, 5276, 36148, 17870, 27589, 1, 28344, 14444, 20678, 20316, 30020, 2658, 18902, 27736, 33191, 13787, 3903, 30823, 3417, 13898, 28123, 19882, 33675, 6858, 18938, 438, 14899, 25867, 3449, 8546, 10568, 25878, 2346, 19287, 35062, 10182, 20211, 26842, 17045, 14084, 8294, 18165, 10125, 23578, 22653, 27969, 4747, 34911, 21179, 1400, 36061, 31218, 3593, 11480, 964, 24459, 24915, 20403, 1987, 10540, 34334, 6510, 25672, 13825, 10437, 19750, 11772, 14742, 13891, 33743, 27755, 31201, 2788, 22257, 19185, 6361, 15107, 840, 17663, 36669, 7788, 3, 13456, 14965, 1116, 19401, 8118, 17479, 29293, 29213, 9458, 10470, 7743, 9387, 32529, 10584, 10107, 8858, 17372, 13537, 29379, 3431, 11901, 17201, 27573, 9119, 32027, 35360, 10544, 27852, 1569, 36373, 683, 11766, 23522, 34669, 13830, 30741, 24889, 24550, 34665, 2163, 10288, 1735, 16092, 647, 33898, 2426, 25668, 25947, 13583, 19533, 30296, 32270, 10111, 34799, 23257, 5258, 4030, 30003, 28668, 16704, 13259, 4868, 32250, 8518, 36585, 264, 9061, 13522, 20260, 2954, 5348, 36446, 13861);
    • Query Plan 2: https://explain.depesz.com/s/85mM

    • Raw SQL 3

      Click to expand
      SELECT
          "p_ci_job_artifacts".*
      FROM
          "p_ci_job_artifacts"
      WHERE
          "p_ci_job_artifacts"."id" BETWEEN 1 AND 10000
          AND "p_ci_job_artifacts"."project_id" IN (4030, 13456, 13861, 19185, 20296, 20678, 23824, 25914, 27539, 30776, 31201, 34701);
    • Query Plan 3: https://explain.depesz.com/s/000G

  • Ci::JobArtifact.pluck_verifiable_ids_in_range(1..10000)

    • Raw SQL 1

      Click to expand
      SELECT DISTINCT
          "p_ci_job_artifacts"."project_id"
      FROM
          "p_ci_job_artifacts"
      WHERE
          "p_ci_job_artifacts"."id" BETWEEN 1 AND 10000;
    • Query Plan 1: https://explain.depesz.com/s/PHyN

    • Raw SQL 2

      Click to expand
      SELECT
          "projects"."id"
      FROM
          "projects"
      WHERE
          "projects"."namespace_id" IN (
              SELECT
                  "namespaces"."id"
              FROM
                  "namespaces"
              WHERE
                  "namespaces"."organization_id" IN (
                      SELECT
                          "geo_node_organization_links"."organization_id"
                      FROM
                          "geo_node_organization_links"
                      WHERE
                          "geo_node_organization_links"."geo_node_id" = 2))
              AND "projects"."id" IN (25914, 27074, 25755, 9117, 3265, 31942, 31877, 20315, 36455, 20705, 18352, 36139, 23833, 9441, 30946, 6403, 32129, 25620, 1673, 34311, 37021, 32956, 11645, 23528, 27539, 22377, 32565, 6306, 8472, 29199, 34348, 9430, 6068, 26170, 11212, 27927, 36979, 30383, 20739, 8844, 3193, 34292, 8792, 14278, 25963, 37289, 18591, 28238, 18321, 9747, 32577, 12552, 23532, 34843, 13156, 27719, 30784, 19739, 13365, 7434, 25000, 27313, 1988, 35988, 1232, 20060, 23575, 11502, 4, 25671, 9011, 28064, 14689, 30776, 19791, 26732, 13511, 1849, 6, 537, 15405, 32322, 32521, 7236, 6827, 14563, 29972, 30408, 13980, 20296, 16666, 34701, 2, 9846, 1714, 21093, 595, 14246, 16865, 23650, 29371, 9598, 9305, 1083, 7531, 7, 2742, 31992, 18898, 33978, 9780, 7579, 27593, 7009, 4720, 11558, 14539, 19094, 9152, 24061, 25094, 22435, 3104, 23824, 5276, 36148, 17870, 27589, 1, 28344, 14444, 20678, 20316, 30020, 2658, 18902, 27736, 33191, 13787, 3903, 30823, 3417, 13898, 28123, 19882, 33675, 6858, 18938, 438, 14899, 25867, 3449, 8546, 10568, 25878, 2346, 19287, 35062, 10182, 20211, 26842, 17045, 14084, 8294, 18165, 10125, 23578, 22653, 27969, 4747, 34911, 21179, 1400, 36061, 31218, 3593, 11480, 964, 24459, 24915, 20403, 1987, 10540, 34334, 6510, 25672, 13825, 10437, 19750, 11772, 14742, 13891, 33743, 27755, 31201, 2788, 22257, 19185, 6361, 15107, 840, 17663, 36669, 7788, 3, 13456, 14965, 1116, 19401, 8118, 17479, 29293, 29213, 9458, 10470, 7743, 9387, 32529, 10584, 10107, 8858, 17372, 13537, 29379, 3431, 11901, 17201, 27573, 9119, 32027, 35360, 10544, 27852, 1569, 36373, 683, 11766, 23522, 34669, 13830, 30741, 24889, 24550, 34665, 2163, 10288, 1735, 16092, 647, 33898, 2426, 25668, 25947, 13583, 19533, 30296, 32270, 10111, 34799, 23257, 5258, 4030, 30003, 28668, 16704, 13259, 4868, 32250, 8518, 36585, 264, 9061, 13522, 20260, 2954, 5348, 36446, 13861);
    • Query Plan 2: https://explain.depesz.com/s/85mM

    • Raw SQL 3

      Click to expand
      SELECT
          "p_ci_job_artifacts".*
      FROM
          "p_ci_job_artifacts"
      WHERE
          "p_ci_job_artifacts"."id" BETWEEN 1 AND 10000
          AND "p_ci_job_artifacts"."project_id" IN (4030, 13456, 13861, 19185, 20296, 20678, 23824, 25914, 27539, 30776, 31201, 34701);
    • Raw SQL 3

    Click to expand
    SELECT
        "p_ci_job_artifacts"."id"
    FROM
        "p_ci_job_artifacts"
    WHERE
        "p_ci_job_artifacts"."id" BETWEEN 1 AND 10000
        AND "p_ci_job_artifacts"."project_id" IN (4030, 13456, 13861, 19185, 20296, 20678, 23824, 25914, 27539, 30776, 31201, 34701);

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Michael Kozono

Merge request reports

Loading