Geo: Test project repository replication v2

Manual Test Plan: Geo Project Repository Replication V2

This test plan is AI generated, and pending manual review.

Summary

This issue tracks the manual testing plan for Epic #17974 - improving handling of projects without Git repositories in Geo replication.

Related MRs:

Feature Flags:

  • geo_project_repository_replication (existing)
  • geo_project_repository_replication_v2 (new)

Problem Being Solved

Previously, Geo would attempt to replicate Git repositories for projects that don't actually have repositories, causing:

  • "Project Repositories checksum failure" in the UI
  • Sync failures with "Error syncing repository: 13:creating repository: cloning repository: exit status 128"
  • False error reporting and wasted resources

Solution Overview

Two-pronged approach:

  1. MR !198308 (merged): Only create project_repository records when Git repositories actually exist
  2. MR !194051: Switch Geo replication to enumerate project_repositories table instead of projects table (V2 replication)

Test Environment Requirements

  • GitLab Geo setup with primary and secondary nodes
  • Admin access to both nodes
  • Feature flags available for toggling
  • Access to Rails console on both nodes
  • Ability to create projects with and without repositories

Test Scenarios

Phase 1: Basic Functionality Tests

1.1 Projects Without Repositories (Core Issue)

V1 Behavior (Before Fix):

# On primary - Rails console
project = Project.create!(name: "test-no-repo", path: "test-no-repo", namespace: user.namespace)
# This creates project_repository record even though no Git repo exists
project.project_repository # Should be present (problematic)

V2 Behavior (After Fix with !198308 (merged)):

# On primary - Rails console
project = Project.create!(name: "test-no-repo-v2", path: "test-no-repo-v2", namespace: user.namespace)  
project.project_repository # Should be nil (correct)

Expected Results:

  • V1: project_repository record exists (legacy behavior)
  • V2: No project_repository record created
  • V2: No Geo replication attempt (no registry created)
  • V2: No errors in secondary logs

1.2 Projects With Repositories (Should Work Both Ways)

# On primary
project = Projects::CreateService.new(user, {
  name: "test-with-repo", 
  path: "test-with-repo",
  initialize_with_readme: true
}).execute

Expected Results:

  • Both V1 and V2: project_repository record created
  • Both V1 and V2: Successful replication to secondary
  • No errors in logs

Phase 2: Feature Flag Switching Tests

2.1 V1 → V2 Migration

# Start with V1 enabled, V2 disabled
Feature.disable(:geo_project_repository_replication_v2)

# Create test projects (mix with/without repos)
5.times do |i|
  if i.even?
    # With repository  
    Projects::CreateService.new(user, {
      name: "migration-test-#{i}", 
      path: "migration-test-#{i}",
      initialize_with_readme: true
    }).execute
  else
    # Without repository
    Project.create!(name: "migration-test-#{i}", path: "migration-test-#{i}", namespace: user.namespace)
  end
end

# Wait for V1 replication to complete
# Check registry state on secondary

# Enable V2 feature flag
Feature.enable(:geo_project_repository_replication_v2)

# Create more projects and verify behavior

Expected Results:

  • Existing V1 registries continue working
  • New projects use V2 logic
  • Projects without repos don't create registries in V2 mode
  • No duplicate replication
  • UI shows consistent counts

2.2 V2 → V1 Rollback

# Start with V2 enabled
Feature.enable(:geo_project_repository_replication_v2)

# Create projects and verify replication
# ...

# Disable V2 feature flag  
Feature.disable(:geo_project_repository_replication_v2)

# Create new projects and verify fallback

Expected Results:

  • Existing V2 registries continue working via delegation
  • New projects use V1 logic
  • No replication interruption
  • UI remains functional

Phase 3: UI and API Tests

3.1 Admin Geo Status Page

Test Steps:

  1. Navigate to /admin/geo/sites
  2. Check "Project Repositories" section in replication status
  3. Verify counts and status indicators
  4. Test with both feature flag states

Expected Results:

  • Accurate counts displayed for both V1 and V2
  • Status indicators work correctly (synced/failed/pending)
  • No GraphQL errors in browser console
  • Performance acceptable with large datasets

3.2 GraphQL API Compatibility

query {
  geoNode {
    projectRepositoryRegistries {
      nodes {
        id
        projectId          # Should always be present
        projectRepositoryId # Should be present only in V2
        state
        lastSyncedAt
      }
    }
  }
}

Expected Results:

  • projectId field always present (backward compatibility)
  • projectRepositoryId field present only when V2 enabled
  • No breaking changes for existing API consumers
  • Proper error handling for edge cases

Phase 4: Error Scenarios and Edge Cases

4.1 Repository Deletion After Creation

# Create project with repository
project = Projects::CreateService.new(user, {..., initialize_with_readme: true}).execute

# Wait for replication
# Delete repository but keep project
project.repository.remove

# Trigger re-verification

Expected Results:

  • V1: May attempt to verify non-existent repo (current behavior)
  • V2: Should handle gracefully, possibly remove registry
  • No infinite retry loops
  • Proper error messages in logs

4.2 Corrupt Registry Data

# Create registry with invalid project_repository_id
registry = Geo::ProjectRepositoryRegistry.create!(project_repository_id: 999999, project_id: project.id)

# Trigger replication worker
Geo::ProjectRepositoryReplicator.new(model_record_id: 999999).execute

Expected Results:

  • Graceful error handling
  • No worker crashes
  • Proper error logging

Phase 5: Performance Tests

5.1 Large Dataset Migration

# Create 100+ projects (70% with repos, 30% without)
100.times do |i|
  if rand < 0.7
    # With repository
    Projects::CreateService.new(user, {
      name: "perf-test-#{i}",
      path: "perf-test-#{i}", 
      initialize_with_readme: true
    }).execute
  else
    # Without repository  
    Project.create!(name: "perf-test-#{i}", path: "perf-test-#{i}", namespace: user.namespace)
  end
end

# Enable V2 replication and monitor
Feature.enable(:geo_project_repository_replication_v2)

Performance Criteria:

  • Migration completes within reasonable time (< 10 minutes for 100 projects)
  • Memory usage remains stable during migration
  • No significant increase in database load
  • Secondary site replication keeps up

Success Criteria

Must Pass

  • All projects with repositories replicate successfully in both V1 and V2
  • Projects without repositories don't cause errors in V2 mode
  • Feature flag switching works seamlessly in both directions
  • UI shows accurate status and counts in all scenarios
  • No regression in existing Geo functionality
  • GraphQL API maintains backward compatibility

Performance

  • No significant performance degradation during normal operations
  • Memory usage remains stable during feature flag switches
  • Replication throughput maintained or improved

Error Handling 🛡️

  • Graceful handling of edge cases (missing repos, corrupt data)
  • Proper error messages and logging (no cryptic failures)
  • No infinite retry loops or worker crashes
  • Clear recovery procedures for problematic states

Test Execution

Pre-test Checklist

  • Test environment prepared and verified
  • Feature flags configured and accessible
  • Baseline metrics captured (performance, error rates)
  • Rollback plan prepared and tested

During Testing

  • Document all test results (pass/fail with details)
  • Capture relevant log snippets for failures
  • Monitor system performance metrics
  • Screenshot UI states for documentation

Post-test

  • Compare performance metrics to baseline
  • Document any workarounds or manual steps needed
  • Verify rollback plan works if needed
  • Prepare summary report

Risk Mitigation

High Risk Scenarios

  1. Data Loss: Registry data corruption during feature flag switch

    • Mitigation: Database backup before testing, staged rollout
  2. Replication Backlog: Large queues during migration

    • Mitigation: Monitor queue sizes, pause if necessary
  3. UI Breakage: GraphQL schema changes break frontend

    • Mitigation: Thorough GraphQL compatibility testing

Rollback Triggers

  • Critical errors in replication
  • Significant performance degradation (>25% slower)
  • UI completely broken
  • Data corruption detected

Test Results Summary

Overall Status: 🟢 Pass / 🟡 Pass with Issues / 🔴 Fail

Key Findings:

Issues Found:

Performance Impact:


Edited by 🤖 GitLab Bot 🤖