Geo: Reduce SSF boilerplate for upload partition replicators
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
📋 Phase 1: Foundation & First Replicator (POC) | Risk: Low | View Epic &20933
Summary
This issue proposes improvements to the Geo Self-Service Framework (SSF) to reduce boilerplate code when adding new blob replicators, specifically for the 22 upload partition tables that need individual Geo replication support.
Background
Currently, adding a new blob type to Geo replication requires creating multiple files with nearly identical boilerplate code. For each new replicable, developers must create:
-
Replicator class (
ee/app/replicators/geo/*_replicator.rb) -
Registry model (
ee/app/models/geo/*_registry.rb) -
State model (
ee/app/models/geo/*_state.rb) -
Registry finder (
ee/app/finders/geo/*_registry_finder.rb) -
GraphQL resolver (
ee/app/graphql/resolvers/geo/*_registries_resolver.rb) -
GraphQL type (
ee/app/graphql/types/geo/*_registry_type.rb) - Database migrations (registry table in geo DB, state table in main DB)
- Database dictionary files
- Factory files for testing
- Spec files for each of the above
-
Manual registration in
REPLICATOR_CLASSESarray inee/lib/gitlab/geo.rb -
Manual registration in
REGISTRY_CLASSESinregistry_consistency_worker.rb -
Manual updates to
GeoNodeTypeGraphQL type -
Manual updates to
registrable_type.rbfor resync/reverify support
With 22 upload partition tables to support, this means creating ~200+ files with mostly identical code.
Proposal: Dynamic Replicator Generation
Phase 1: Convention-based Auto-discovery
Replace manual registration with convention-based auto-discovery:
# ee/lib/gitlab/geo.rb
def self.replicator_classes
@replicator_classes ||= discover_replicator_classes
end
def self.discover_replicator_classes
Dir[Rails.root.join('ee/app/replicators/geo/*_replicator.rb')].map do |file|
class_name = "Geo::#{File.basename(file, '.rb').camelize}"
class_name.constantize
end.select { |klass| klass < Gitlab::Geo::Replicator }
end
Phase 2: Dynamic GraphQL Registration
Auto-generate GraphQL types, resolvers, and fields based on registered replicators:
# ee/app/graphql/types/geo/geo_node_type.rb
Gitlab::Geo.replicator_classes.each do |replicator_class|
field replicator_class.graphql_field_name,
replicator_class.graphql_registry_type.connection_type,
null: true,
resolver: replicator_class.graphql_resolver_class
end
Phase 3: Base Upload Replicator for Partitioned Tables
Create a base class specifically for upload partition replicators:
# ee/app/replicators/geo/base_upload_partition_replicator.rb
module Geo
class BaseUploadPartitionReplicator < Gitlab::Geo::Replicator
include ::Geo::BlobReplicatorStrategy
class << self
# Subclasses only need to define:
# - model (the upload model class)
# - replicable_title / replicable_title_plural
def registry_class
# Auto-generate or use shared registry with partition key
@registry_class ||= generate_registry_class
end
end
def carrierwave_uploader
model_record.retrieve_uploader
end
end
end
Phase 4: Shared Registry with Partition Discrimination
Instead of 22 separate registry tables, consider a shared registry approach:
# Single registry table with upload_type discriminator
create_table :geo_upload_partition_registries do |t|
t.string :upload_type, null: false # e.g., 'abuse_report', 'achievement', etc.
t.bigint :upload_id, null: false
# ... standard registry columns
t.index [:upload_type, :upload_id], unique: true
end
This would allow a single Geo::UploadPartitionRegistry model with STI or type discrimination.
Phase 5: Generator Script
Create a Rails generator for new upload partition replicators:
bin/rails generate geo:upload_partition_replicator AbuseReport \
--table=abuse_report_uploads \
--sharding_key=organization_id
This would generate all necessary files with correct naming and configuration.
Implementation Checklist
Auto-discovery and Registration
- Implement convention-based replicator class discovery
-
Remove manual
REPLICATOR_CLASSESarray maintenance - Auto-register registry classes in consistency worker
-
Auto-generate GraphQL fields on
GeoNodeType
Base Classes and Concerns
-
Create
Geo::BaseUploadPartitionReplicatorbase class - Extract common upload replicator logic into shared concern
- Create shared registry concern for upload partitions
Database Optimization
- Evaluate shared registry table vs individual tables
- Create migration generator for registry/state tables
- Auto-generate database dictionary files
GraphQL Automation
- Dynamic GraphQL type generation from replicator metadata
- Dynamic resolver generation
-
Auto-registration in
registrable_type.rb
Testing Infrastructure
- Shared examples that work with minimal configuration
- Factory generator for new replicators
- Automated spec generation
Documentation and Tooling
- Rails generator for new upload partition replicators
- Update issue template to reflect reduced manual steps
- Document the new streamlined process
Benefits
- Reduced code duplication: ~90% reduction in boilerplate files
- Faster implementation: Adding a new upload type takes minutes instead of hours
- Fewer errors: Less manual registration means fewer missed steps
- Easier maintenance: Changes to common behavior only need to be made once
- Better consistency: All upload replicators behave identically
Risks and Mitigations
| Risk | Mitigation |
|---|---|
| Magic/implicit behavior harder to debug | Good logging, clear documentation |
| Performance of auto-discovery at boot | Cache discovered classes, lazy loading |
| Breaking existing replicators | Gradual migration, feature flags |
Related
- Parent epic: &20933
- MR: !221773 (closed)
- Related issue: #227693 (closed) (Avoid maintaining REPLICATOR_CLASSES list)