Skip to content

Geo: Allow selective sync by organizations for Group Wikis

What does this MR do and why?

This code change implements support for organization-based selective synchronization in a geo-replication system. Previously, the system had a placeholder that returned no data when trying to sync by organizations. Now it properly retrieves all namespaces (data containers) that belong to selected organizations and includes their child namespaces. Additionally, the group wiki repository synchronization logic was updated to handle organization-based selection the same way it handles namespace-based selection, ensuring that wiki repositories are properly synced when organizations are selected for replication.

References

How to set up and validate locally

Prerequisites

Click to expand
  1. Set up Geo with GDK

    • Follow the GDK Geo setup guide to configure a primary and secondary Geo instance
    • Ensure both instances are running properly
  2. Enable organization features

    • Run these Rails commands on your primary GDK instance in Rails console:

      Feature.enable_percentage_of_time(:allow_organization_creation, 100)
      Feature.enable_percentage_of_time(:organization_switching, 100)
      Feature.enable_percentage_of_time(:ui_for_organizations, 100)
  3. Create test organizations

    • Run these Rails commands to create test organizations with projects:

      # Create first organization with owner
      org1 = Organizations::Organization.create!(name: 'Test Org 1', path: 'test-org-1', visibility_level: Organizations::Organization::PUBLIC)
      Organizations::OrganizationUser.create_organization_record_for(User.first.id, org1.id)
      
      # Create second organization with owner
      org2 = Organizations::Organization.create!(name: 'Test Org 2', path: 'test-org-2', visibility_level: Organizations::Organization::PUBLIC)
      Organizations::OrganizationUser.create_organization_record_for(User.first.id, org2.id)
      
      # Create 3 group wikis in first organization
      3.times do |i|
        group = Group.create!(name: "Group #{i+1}", "org-1-group-#{i+1}", organization: org1)
        group.add_owner(User.first)
        group.create_wiki
      end
      
      # Create 3 group wikis in second organization
      3.times do |i|
        group = Group.create!(name: "Group #{i+1}", "org-2-group-#{i+1}", organization: org2)
        group.add_owner(User.first)
        group.create_wiki
      end
      
      puts 'Created 2 organizations with 3 groups wiki repositories each'
  4. Create a personal access token

Primary Site Selective Checksumming by Organizations - Testing Steps

Click to expand
  1. In the primary GDK site: gdk switch 534201-org-mover-implement-selective-sync-scope-for-project-repository

  2. In the secondary GDK site: gdk switch 534201-org-mover-implement-selective-sync-scope-for-project-repository

  3. In the primary GDK site, open Rails console: bin/rails c

  4. Enable the FF: Feature.enable(:org_mover_extend_selective_sync_to_primary_checksumming)

  5. Enable the FF: Feature.enable(:geo_selective_sync_by_organizations)

  6. Get your current configuration:

    # Get your personal access token
    export PRIVATE_TOKEN="your_personal_access_token"
    
    # List all Geo sites to get the site ID
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites" | jq
    
    # Store the site ID of the primary node
    export SITE_ID=1  # Replace with your primary site ID
    
    # Output organization objects for their IDs
    bin/rails runner "pp Organizations::Organization.all"
    
    # Store an organization ID for testing
    export ORG_ID=1003  # Replace with your organization ID
  7. Enable selective checksumming by organization:

    # Enable selective checksumming by organization and select the specific organization
    curl --request PUT \
      --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" \
      --header "Content-Type: application/json" \
      --data '{
        "selective_sync_type": "organizations",
        "selective_sync_organization_ids": ['$ORG_ID']
      }' \
      "http://localhost:3000/api/v4/geo_sites/$SITE_ID"
  8. Verify the configuration:

    # Get the updated Geo site configuration and confirm that selective_sync_type is "organizations" and 
    # organization_ids contains your organization ID
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID" | jq
  9. Wait a few minutes and verify the secondary site status

    # Get the updated site status and confirm that group_wiki_repositories_checksummed_count 
    # matches the number of group wiki repositories that belong to your organization ID
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID/status" | jq
  10. Test with multiple organizations:

    export ORG_ID2=1004  # Replace with another organization ID
    
    # Update to include multiple organizations
    curl --request PUT \
      --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" \
      --header "Content-Type: application/json" \
      --data '{
        "selective_sync_type": "organizations",
        "selective_sync_organization_ids": ['$ORG_ID','$ORG_ID2']
      }' \
      "http://localhost:3000/api/v4/geo_sites/$SITE_ID"
    
    # Verify the update
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID" | jq
    
    # Wait a few minutes and verify the Geo site status
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID/status" | jq
  11. Disable selective sync:

    # Reset back to no selective sync
    curl --request PUT \
      --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" \
      --header "Content-Type: application/json" \
      --data '{
        "selective_sync_type": ""
      }' \
      "http://localhost:3000/api/v4/geo_sites/$SITE_ID"
    
    # Verify the update
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID"
    
    # Wait a few minutes and verify the Geo site status
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID/status" | jq
    
    # In the primary GDK site, disable the FF: 
    bin/rails runner "pp Feature.disable(:geo_selective_sync_by_organizations)"

Secondary Site Selective Sync by Organizations - Testing Steps

Click to expand
  1. In the primary GDK site: gdk switch 534201-org-mover-implement-selective-sync-scope-for-project-repository

  2. In the secondary GDK site: gdk switch 534201-org-mover-implement-selective-sync-scope-for-project-repository

  3. In the primary GDK site, open Rails console: bin/rails c

  4. Enable the FF: Feature.enable(:geo_selective_sync_by_organizations)

  5. Get your current configuration:

    # Get your personal access token
    export PRIVATE_TOKEN="your_personal_access_token"
    
    # List all Geo sites to get the site ID
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites" | jq
    
    # Store the site ID of the secondary node
    export SITE_ID=2  # Replace with your secondary site ID
    
    # Output organization objects for their IDs
    bin/rails runner "pp Organizations::Organization.all"
    
    # Store an organization ID for testing
    export ORG_ID=1003  # Replace with your organization ID
  6. Enable selective sync by organization:

    # Enable selective sync by organization and select the specific organization
    curl --request PUT \
      --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" \
      --header "Content-Type: application/json" \
      --data '{
        "selective_sync_type": "organizations",
        "selective_sync_organization_ids": ['$ORG_ID']
      }' \
      "http://localhost:3000/api/v4/geo_sites/$SITE_ID"
  7. Verify the configuration:

    # Get the updated Geo site configuration and confirm that selective_sync_type is "organizations" and 
    # organization_ids contains your organization ID
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID" | jq
  8. Wait a few minutes and verify the secondary site status

    # Get the updated site status and confirm that group_wiki_repositories_checksummed_count 
    # matches the number of group wiki repositories that belong to your organization ID
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID/status" | jq
  9. Test with multiple organizations:

    export ORG_ID2=1004  # Replace with another organization ID
    
    # Update to include multiple organizations
    curl --request PUT \
      --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" \
      --header "Content-Type: application/json" \
      --data '{
        "selective_sync_type": "organizations",
        "selective_sync_organization_ids": ['$ORG_ID','$ORG_ID2']
      }' \
      "http://localhost:3000/api/v4/geo_sites/$SITE_ID"
    
    # Verify the update
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID" | jq
    
    # Wait a few minutes and verify the Geo site status
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID/status" | jq
  10. Disable selective sync:

    # Reset back to no selective sync
    curl --request PUT \
      --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" \
      --header "Content-Type: application/json" \
      --data '{
        "selective_sync_type": ""
      }' \
      "http://localhost:3000/api/v4/geo_sites/$SITE_ID"
    
    # Verify the update
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID"
    
    # Wait a few minutes and verify the Geo site status
    curl --header "PRIVATE-TOKEN: $PRIVATE_TOKEN" "http://localhost:3000/api/v4/geo_sites/$SITE_ID/status" | jq
    
    # In the primary GDK site, disable the FF: 
    bin/rails runner "pp Feature.disable(:org_mover_extend_selective_sync_to_primary_checksumming)"
    bin/rails runner "pp Feature.disable(:geo_selective_sync_by_organizations)"

Database Queries

  • GroupWikiRepository.replicables_for_current_secondary(1..10000)

    • Raw SQL

      Click to expand
      SELECT
          "group_wiki_repositories".*
      FROM
          "group_wiki_repositories"
          INNER JOIN "namespaces" ON "namespaces"."id" = "group_wiki_repositories"."group_id"
              AND "namespaces"."type" = 'Group'
      WHERE
          "group_wiki_repositories"."group_id" IN (
              SELECT
                  "namespaces"."id"
              FROM
                  "namespaces"
              WHERE
                  "namespaces"."organization_id" IN (
                      SELECT
                          "geo_node_organization_links"."organization_id"
                      FROM
                          "geo_node_organization_links"
                      WHERE
                          "geo_node_organization_links"."geo_node_id" = 2))
              AND "group_wiki_repositories"."group_id" BETWEEN 1 AND 10000;
    • Query Plan: https://explain.depesz.com/s/bcsj

  • GroupWikiRepository.pluck_verifiable_ids_in_range(1..10000)

    • Raw SQL

      Click to expand
      SELECT
          "group_wiki_repositories"."group_id"
      FROM
          "group_wiki_repositories"
          INNER JOIN "namespaces" ON "namespaces"."id" = "group_wiki_repositories"."group_id"
              AND "namespaces"."type" = 'Group'
      WHERE
          "group_wiki_repositories"."group_id" IN (
              SELECT
                  "namespaces"."id"
              FROM
                  "namespaces"
              WHERE
                  "namespaces"."organization_id" IN (
                      SELECT
                          "geo_node_organization_links"."organization_id"
                      FROM
                          "geo_node_organization_links"
                      WHERE
                          "geo_node_organization_links"."geo_node_id" = 2))
              AND "group_wiki_repositories"."group_id" BETWEEN 1 AND 10000;
    • Query Plan: https://explain.depesz.com/s/Xrlx

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Douglas Barbosa Alexandre

Merge request reports

Loading