Geo: Reduce SSF boilerplate for upload partition replicators

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

📋 Phase 1: Foundation & First Replicator (POC) | Risk: Low | View Epic &20933


Summary

This issue proposes improvements to the Geo Self-Service Framework (SSF) to reduce boilerplate code when adding new blob replicators, specifically for the 22 upload partition tables that need individual Geo replication support.

Background

Currently, adding a new blob type to Geo replication requires creating multiple files with nearly identical boilerplate code. For each new replicable, developers must create:

  1. Replicator class (ee/app/replicators/geo/*_replicator.rb)
  2. Registry model (ee/app/models/geo/*_registry.rb)
  3. State model (ee/app/models/geo/*_state.rb)
  4. Registry finder (ee/app/finders/geo/*_registry_finder.rb)
  5. GraphQL resolver (ee/app/graphql/resolvers/geo/*_registries_resolver.rb)
  6. GraphQL type (ee/app/graphql/types/geo/*_registry_type.rb)
  7. Database migrations (registry table in geo DB, state table in main DB)
  8. Database dictionary files
  9. Factory files for testing
  10. Spec files for each of the above
  11. Manual registration in REPLICATOR_CLASSES array in ee/lib/gitlab/geo.rb
  12. Manual registration in REGISTRY_CLASSES in registry_consistency_worker.rb
  13. Manual updates to GeoNodeType GraphQL type
  14. Manual updates to registrable_type.rb for resync/reverify support

With 22 upload partition tables to support, this means creating ~200+ files with mostly identical code.

Proposal: Dynamic Replicator Generation

Phase 1: Convention-based Auto-discovery

Replace manual registration with convention-based auto-discovery:

# ee/lib/gitlab/geo.rb
def self.replicator_classes
  @replicator_classes ||= discover_replicator_classes
end

def self.discover_replicator_classes
  Dir[Rails.root.join('ee/app/replicators/geo/*_replicator.rb')].map do |file|
    class_name = "Geo::#{File.basename(file, '.rb').camelize}"
    class_name.constantize
  end.select { |klass| klass < Gitlab::Geo::Replicator }
end

Phase 2: Dynamic GraphQL Registration

Auto-generate GraphQL types, resolvers, and fields based on registered replicators:

# ee/app/graphql/types/geo/geo_node_type.rb
Gitlab::Geo.replicator_classes.each do |replicator_class|
  field replicator_class.graphql_field_name,
        replicator_class.graphql_registry_type.connection_type,
        null: true,
        resolver: replicator_class.graphql_resolver_class
end

Phase 3: Base Upload Replicator for Partitioned Tables

Create a base class specifically for upload partition replicators:

# ee/app/replicators/geo/base_upload_partition_replicator.rb
module Geo
  class BaseUploadPartitionReplicator < Gitlab::Geo::Replicator
    include ::Geo::BlobReplicatorStrategy

    class << self
      # Subclasses only need to define:
      # - model (the upload model class)
      # - replicable_title / replicable_title_plural
      
      def registry_class
        # Auto-generate or use shared registry with partition key
        @registry_class ||= generate_registry_class
      end
    end

    def carrierwave_uploader
      model_record.retrieve_uploader
    end
  end
end

Phase 4: Shared Registry with Partition Discrimination

Instead of 22 separate registry tables, consider a shared registry approach:

# Single registry table with upload_type discriminator
create_table :geo_upload_partition_registries do |t|
  t.string :upload_type, null: false  # e.g., 'abuse_report', 'achievement', etc.
  t.bigint :upload_id, null: false
  # ... standard registry columns
  t.index [:upload_type, :upload_id], unique: true
end

This would allow a single Geo::UploadPartitionRegistry model with STI or type discrimination.

Phase 5: Generator Script

Create a Rails generator for new upload partition replicators:

bin/rails generate geo:upload_partition_replicator AbuseReport \
  --table=abuse_report_uploads \
  --sharding_key=organization_id

This would generate all necessary files with correct naming and configuration.

Implementation Checklist

Auto-discovery and Registration

  • Implement convention-based replicator class discovery
  • Remove manual REPLICATOR_CLASSES array maintenance
  • Auto-register registry classes in consistency worker
  • Auto-generate GraphQL fields on GeoNodeType

Base Classes and Concerns

  • Create Geo::BaseUploadPartitionReplicator base class
  • Extract common upload replicator logic into shared concern
  • Create shared registry concern for upload partitions

Database Optimization

  • Evaluate shared registry table vs individual tables
  • Create migration generator for registry/state tables
  • Auto-generate database dictionary files

GraphQL Automation

  • Dynamic GraphQL type generation from replicator metadata
  • Dynamic resolver generation
  • Auto-registration in registrable_type.rb

Testing Infrastructure

  • Shared examples that work with minimal configuration
  • Factory generator for new replicators
  • Automated spec generation

Documentation and Tooling

  • Rails generator for new upload partition replicators
  • Update issue template to reflect reduced manual steps
  • Document the new streamlined process

Benefits

  1. Reduced code duplication: ~90% reduction in boilerplate files
  2. Faster implementation: Adding a new upload type takes minutes instead of hours
  3. Fewer errors: Less manual registration means fewer missed steps
  4. Easier maintenance: Changes to common behavior only need to be made once
  5. Better consistency: All upload replicators behave identically

Risks and Mitigations

Risk Mitigation
Magic/implicit behavior harder to debug Good logging, clear documentation
Performance of auto-discovery at boot Cache discovered classes, lazy loading
Breaking existing replicators Gradual migration, feature flags
Edited by 🤖 GitLab Bot 🤖