Skip to content

Geo - Allow selectivey sync by orgs for dependency proxy manifests

What does this MR do and why?

Previously, the system could only sync all manifests or filter by specific namespaces/groups. Now it properly supports filtering by organizations, giving administrators more flexible options for what data gets synced between sites.

References

How to set up and validate locally

Prerequisites

Click to expand
  1. Set up Geo with GDK

    • Follow the GDK Geo setup guide to configure a primary and secondary Geo instance
    • Ensure both instances are running properly
  2. Enable organization features

    • Run these Rails commands on your primary GDK instance in Rails console:

      Feature.enable_percentage_of_time(:allow_organization_creation, 100)
      Feature.enable_percentage_of_time(:organization_switching, 100)
      Feature.enable_percentage_of_time(:ui_for_organizations, 100)
  3. Create test organizations

    • Run these Rails commands to create test organizations with projects:

      # Create first organization with owner
      org1 = Organizations::Organization.create!(name: 'Test Org 1', path: 'test-org-1', visibility_level: Organizations::Organization::PUBLIC)
      Organizations::OrganizationUser.create_organization_record_for(User.first.id, org1.id)
      
      # Create second organization with owner
      org2 = Organizations::Organization.create!(name: 'Test Org 2', path: 'test-org-2', visibility_level: Organizations::Organization::PUBLIC)
      Organizations::OrganizationUser.create_organization_record_for(User.first.id, org2.id)
      
      # Create design management manifests in first organization
      group1 = Group.create!(name: 'Group 1', path: 'group-1', organization: org1)
      group1.add_owner(User.first)
      
      # Create 3 design management manifests in first organization
      3.times do |i|
        manifest = DependencyProxy::Manifest.new(
          group: group1,
          size: 1234,
          digest: 'sha256:d0710affa17fad5f466a70159cc458227bd25d4afb39514ef662ead3e6c99515',
          file_name: "alpine:latest#{SecureRandom.hex(4)}.json",
          content_type: 'application/vnd.docker.distribution.manifest.v2+json',
          status: :default
        )
      
        manifest.file = Rack::Test::UploadedFile.new('spec/fixtures/dependency_proxy/manifest')
        manifest.save!
      end
      
      # Create design management manifests in second organization
      group2 = Group.create!(name: 'Group 2', path: 'group-2', organization: org2)
      group2.add_owner(User.first)
      
      # Create 3 design management manifests in second organization
      3.times do |i|
        manifest = DependencyProxy::Manifest.new(
          group: group2,
          size: 1234,
          digest: 'sha256:d0710affa17fad5f466a70159cc458227bd25d4afb39514ef662ead3e6c99515',
          file_name: "alpine:latest#{SecureRandom.hex(4)}.json",
          content_type: 'application/vnd.docker.distribution.manifest.v2+json',
          status: :default
        )
      
        manifest.file = Rack::Test::UploadedFile.new('spec/fixtures/dependency_proxy/manifest')
        manifest.save!
      end
      
      puts 'Created 2 organizations with 3 design management manifests each'
  4. Create a personal access token

Primary Site Selective Checksumming by Organizations - Testing Steps

Click to expand
  1. In the primary GDK site: gdk switch 534194-org-mover-implement-selective-sync-scope-for-dependencyproxy-manifest

  2. In the secondary GDK site: gdk switch 534194-org-mover-implement-selective-sync-scope-for-dependencyproxy-manifest

  3. In the primary GDK site, open Rails console: bin/rails c

  4. Enable the FF: Feature.enable(:org_mover_extend_selective_sync_to_primary_checksumming)

  5. Enable the FF: Feature.enable(:geo_selective_sync_by_organizations)

  6. Get your current configuration:

    # Get your personal access token
    export PRIVATE_TOKEN="your_personal_access_token"
    
    # List all Geo sites to get the site ID
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites" | jq
    
    # Store the site ID of the primary node
    export SITE_ID=1  # Replace with your primary site ID
    
    # Output organization objects for their IDs
    bin/rails runner "pp Organizations::Organization.all"
    
    # Store an organization ID for testing
    export ORG_ID=1003  # Replace with your organization ID
  7. Enable selective checksumming by organization:

    # Enable selective checksumming by organization and select the specific organization
    curl --request PUT \
      --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" \
      --header "Content-Type: application/json" \
      --data '{
        "selective_sync_type": "organizations",
        "selective_sync_organization_ids": ['$ORG_ID']
      }' \
      "http://localhost:3000/api/v4/geo_sites/$SITE_ID"
  8. Verify the configuration:

    # Get the updated Geo site configuration and confirm that selective_sync_type is "organizations" and 
    # organization_ids contains your organization ID
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID" | jq
  9. Wait a few minutes and verify the secondary site status

    # Get the updated site status and confirm that dependency_proxy_manifests_checksummed_count 
    # matches the number of dependency proxy manifests that belong to your organization ID
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID/status" | jq
  10. Test with multiple organizations:

    export ORG_ID2=1004  # Replace with another organization ID
    
    # Update to include multiple organizations
    curl --request PUT \
      --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" \
      --header "Content-Type: application/json" \
      --data '{
        "selective_sync_type": "organizations",
        "selective_sync_organization_ids": ['$ORG_ID','$ORG_ID2']
      }' \
      "http://localhost:3000/api/v4/geo_sites/$SITE_ID"
    
    # Verify the update
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID" | jq
    
    # Wait a few minutes and verify the Geo site status
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID/status" | jq
  11. Disable selective sync:

    # Reset back to no selective sync
    curl --request PUT \
      --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" \
      --header "Content-Type: application/json" \
      --data '{
        "selective_sync_type": ""
      }' \
      "http://localhost:3000/api/v4/geo_sites/$SITE_ID"
    
    # Verify the update
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID"
    
    # Wait a few minutes and verify the Geo site status
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID/status" | jq
    
    # In the primary GDK site, disable the FF: 
    bin/rails runner "pp Feature.disable(:geo_selective_sync_by_organizations)"

Secondary Site Selective Sync by Organizations - Testing Steps

Click to expand
  1. In the primary GDK site: gdk switch 534194-org-mover-implement-selective-sync-scope-for-dependencyproxy-manifest

  2. In the secondary GDK site: gdk switch 534194-org-mover-implement-selective-sync-scope-for-dependencyproxy-manifest

  3. In the primary GDK site, open Rails console: bin/rails c

  4. Enable the FF: Feature.enable(:geo_selective_sync_by_organizations)

  5. Get your current configuration:

    # Get your personal access token
    export PRIVATE_TOKEN="your_personal_access_token"
    
    # List all Geo sites to get the site ID
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites" | jq
    
    # Store the site ID of the secondary node
    export SITE_ID=2  # Replace with your secondary site ID
    
    # Output organization objects for their IDs
    bin/rails runner "pp Organizations::Organization.all"
    
    # Store an organization ID for testing
    export ORG_ID=1003  # Replace with your organization ID
  6. Enable selective sync by organization:

    # Enable selective sync by organization and select the specific organization
    curl --request PUT \
      --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" \
      --header "Content-Type: application/json" \
      --data '{
        "selective_sync_type": "organizations",
        "selective_sync_organization_ids": ['$ORG_ID']
      }' \
      "http://localhost:3000/api/v4/geo_sites/$SITE_ID"
  7. Verify the configuration:

    # Get the updated Geo site configuration and confirm that selective_sync_type is "organizations" and 
    # organization_ids contains your organization ID
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID" | jq
  8. Wait a few minutes and verify the secondary site status

    # Get the updated site status and confirm that dependency_proxy_manifests_checksummed_count 
    # matches the number of dependency proxy manifests that belong to your organization ID
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID/status" | jq
  9. Test with multiple organizations:

    export ORG_ID2=1004  # Replace with another organization ID
    
    # Update to include multiple organizations
    curl --request PUT \
      --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" \
      --header "Content-Type: application/json" \
      --data '{
        "selective_sync_type": "organizations",
        "selective_sync_organization_ids": ['$ORG_ID','$ORG_ID2']
      }' \
      "http://localhost:3000/api/v4/geo_sites/$SITE_ID"
    
    # Verify the update
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID" | jq
    
    # Wait a few minutes and verify the Geo site status
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID/status" | jq
  10. Disable selective sync:

    # Reset back to no selective sync
    curl --request PUT \
      --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" \
      --header "Content-Type: application/json" \
      --data '{
        "selective_sync_type": ""
      }' \
      "http://localhost:3000/api/v4/geo_sites/$SITE_ID"
    
    # Verify the update
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID"
    
    # Wait a few minutes and verify the Geo site status
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID/status" | jq
    
    # In the primary GDK site, disable the FF: 
    bin/rails runner "pp Feature.disable(:org_mover_extend_selective_sync_to_primary_checksumming)"
    bin/rails runner "pp Feature.disable(:geo_selective_sync_by_organizations)"

Database Queries

  • DependencyProxy::Manifest.replicables_for_current_secondary(1..10000)

    • Raw SQL

      Click to expand
      SELECT
          "dependency_proxy_manifests".*
      FROM
          "dependency_proxy_manifests"
          INNER JOIN "namespaces" ON "namespaces"."id" = "dependency_proxy_manifests"."group_id"
              AND "namespaces"."type" = 'Group'
      WHERE
          "dependency_proxy_manifests"."file_store" = 1
          AND "namespaces"."id" IN (
              SELECT
                  "namespaces"."id"
              FROM
                  "namespaces"
              WHERE
                  "namespaces"."organization_id" IN (
                      SELECT
                          "geo_node_organization_links"."organization_id"
                      FROM
                          "geo_node_organization_links"
                      WHERE
                          "geo_node_organization_links"."geo_node_id" = 2))
              AND "dependency_proxy_manifests"."id" BETWEEN 1 AND 10000;
    • Query Plan: https://explain.depesz.com/s/3uWP

  • DependencyProxy::Manifest.pluck_verifiable_ids_in_range(1..10000)

    • Raw SQL

      Click to expand
      SELECT
          "dependency_proxy_manifests"."id"
      FROM
          "dependency_proxy_manifests"
          INNER JOIN "namespaces" ON "namespaces"."id" = "dependency_proxy_manifests"."group_id"
              AND "namespaces"."type" = 'Group'
      WHERE
          "dependency_proxy_manifests"."file_store" = 1
          AND "namespaces"."id" IN (
              SELECT
                  "namespaces"."id"
              FROM
                  "namespaces"
              WHERE
                  "namespaces"."organization_id" IN (
                      SELECT
                          "geo_node_organization_links"."organization_id"
                      FROM
                          "geo_node_organization_links"
                      WHERE
                          "geo_node_organization_links"."geo_node_id" = 2))
              AND "dependency_proxy_manifests"."id" BETWEEN 1 AND 10000;
    • Query Plan: https://explain.depesz.com/s/VemB

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Douglas Barbosa Alexandre

Merge request reports

Loading