Geo: Migrate projects replication/verification to use SSF
Replicate Projects - Repository
This issue is for implementing Geo replication and verification of Projects.
For more background, see Geo self-service framework.
In order to implement and test this feature, you need to first set up Geo locally.
There are three main sections below. It is a good idea to structure your merge requests this way as well:
- Modify database schemas to prepare to add Geo support for Projects
- Implement Geo support of Projects behind a feature flag
- Release Geo support of Projects
It is also a good idea to first open a proof-of-concept merge request. It can be helpful for working out kinks and getting initial support and feedback from the Geo team. As an example, see the Proof of Concept to replicate Pipeline Artifacts.
You can look into the following example for implementing replication/verification for a new Git repository type:
Modify database schemas to prepare to add Geo support for Projects
Add the registry table to track replication and verification state
Geo secondary sites have a Geo tracking database independent of the main database. It is used to track the replication and verification state of all replicables. Every Model has a corresponding "registry" table in the Geo tracking database.
-
Create the migration file in ee/db/geo/migrate
:bin/rails generate migration CreateProjectRegistry --database geo
-
Replace the contents of the migration file with the following. Note that we cannot add a foreign key constraint on project_id
because theprojects
table is in a different database. The application code must handle logic such as propagating deletions.# frozen_string_literal: true class CreateProjectRegistry < Gitlab::Database::Migration[2.1] def change create_table :project_registry, id: :bigserial, force: :cascade do |t| t.bigint :project_id, null: false t.datetime_with_timezone :created_at, null: false t.datetime_with_timezone :last_synced_at t.datetime_with_timezone :retry_at t.datetime_with_timezone :verified_at t.datetime_with_timezone :verification_started_at t.datetime_with_timezone :verification_retry_at t.integer :state, default: 0, null: false, limit: 2 t.integer :verification_state, default: 0, null: false, limit: 2 t.integer :retry_count, default: 0, limit: 2, null: false t.integer :verification_retry_count, default: 0, limit: 2, null: false t.boolean :checksum_mismatch, default: false, null: false t.boolean :force_to_redownload, default: false, null: false t.boolean :missing_on_primary, default: false, null: false t.binary :verification_checksum t.binary :verification_checksum_mismatched t.text :verification_failure, limit: 255 t.text :last_sync_failure, limit: 255 t.index :project_id, name: :index_project_registry_on_project_id, unique: true t.index :retry_at t.index :state # To optimize performance of ProjectRegistry.verification_failed_batch t.index :verification_retry_at, name: :project_registry_failed_verification, order: "NULLS FIRST", where: "((state = 2) AND (verification_state = 3))" # To optimize performance of ProjectRegistry.needs_verification_count t.index :verification_state, name: :project_registry_needs_verification, where: "((state = 2) AND (verification_state = ANY (ARRAY[0, 3])))" # To optimize performance of ProjectRegistry.verification_pending_batch t.index :verified_at, name: :project_registry_pending_verification, order: "NULLS FIRST", where: "((state = 2) AND (verification_state = 0))" end end end
-
If deviating from the above example, then be sure to order columns according to our guidelines. -
Add the new table to the database dictionary defined in ee/db/docs/
:table_name: project_registry description: Description example introduced_by_url: Merge request link milestone: Milestone example feature_categories: - Feature category example classes: - Class example gitlab_schema: gitlab_geo
-
Run Geo tracking database migrations: bin/rake db:migrate:geo
-
Be sure to commit the relevant changes in ee/db/geo/structure.sql
and the file underee/db/geo/schema_migrations
Add verification state to the Model
The Geo primary site needs to checksum every replicable so secondaries can verify their own checksums. To do this, Geo requires the Model to have an associated table to track verification state.
-
Create the migration file in db/migrate
:bin/rails generate migration CreateProjectStates
-
Replace the contents of the migration file with: # frozen_string_literal: true class CreateProjectStates < Gitlab::Database::Migration[2.1] VERIFICATION_STATE_INDEX_NAME = "index_project_states_on_verification_state" PENDING_VERIFICATION_INDEX_NAME = "index_project_states_pending_verification" FAILED_VERIFICATION_INDEX_NAME = "index_project_states_failed_verification" NEEDS_VERIFICATION_INDEX_NAME = "index_project_states_needs_verification" enable_lock_retries! def up create_table :project_states, id: false do |t| t.datetime_with_timezone :verification_started_at t.datetime_with_timezone :verification_retry_at t.datetime_with_timezone :verified_at t.references :project, primary_key: true, default: nil, index: false, foreign_key: { on_delete: :cascade } t.integer :verification_state, default: 0, limit: 2, null: false t.integer :verification_retry_count, default: 0, limit: 2, null: false t.binary :verification_checksum, using: 'verification_checksum::bytea' t.text :verification_failure, limit: 255 t.index :verification_state, name: VERIFICATION_STATE_INDEX_NAME t.index :verified_at, where: "(verification_state = 0)", order: { verified_at: 'ASC NULLS FIRST' }, name: PENDING_VERIFICATION_INDEX_NAME t.index :verification_retry_at, where: "(verification_state = 3)", order: { verification_retry_at: 'ASC NULLS FIRST' }, name: FAILED_VERIFICATION_INDEX_NAME t.index :verification_state, where: "(verification_state = 0 OR verification_state = 3)", name: NEEDS_VERIFICATION_INDEX_NAME end end def down drop_table :project_states end end
-
If deviating from the above example, then be sure to order columns according to our guidelines. -
If projects
is a high-traffic table, follow the database documentation to usewith_lock_retries
-
Add the new table to the database dictionary defined in db/docs/
:--- table_name: project_states description: Separate table for project verification states introduced_by_url: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/XXXXX milestone: 'XX.Y' feature_categories: - geo_replication classes: - Geo::ProjectState gitlab_schema: gitlab_main
-
Run database migrations: bin/rake db:migrate
-
Be sure to commit the relevant changes in db/structure.sql
and the file underdb/schema_migrations
That's all of the required database changes.
Implement Geo support of Projects behind a feature flag
Step 1. Implement replication and verification
-
Add the following lines to the project
model to accomplish some important tasks:- Include
::Geo::ReplicableModel
in theProject
class, and specify the Replicator classwith_replicator Geo::ProjectReplicator
. - Include the
::Geo::VerifiableModel
concern. - Delegate verification related methods to the
project_state
model. - For verification, override some scopes to use the
project_states
table instead of the model table. - Implement the
verification_state_object
method to return the object that holds the verification details - Override some methods to use the
project_states
table in verification-related queries.
Pay some attention to method
pool_repository
. Not every repository type uses repository pooling. As Geo prefers to use repository snapshotting, it can lead to data loss. Make sure to overwritepool_repository
so it returns nil for repositories that do not have pools.At this point the
Project
class should look like this:# frozen_string_literal: true class Project < ApplicationRecord ... include ::Geo::ReplicableModel include ::Geo::VerifiableModel delegate(*::Geo::VerificationState::VERIFICATION_METHODS, to: :project_state) with_replicator Geo::ProjectReplicator has_one :project_state, autosave: false, inverse_of: :project, class_name: 'Geo::ProjectState' after_save :save_verification_details # Override the `all` default if not all records can be replicated. For an # example of an existing Model that needs to do this, see # `EE::MergeRequestDiff`. # scope :available_replicables, -> { all } scope :available_verifiables, -> { joins(:project_state) } scope :checksummed, -> { joins(:project_state).where.not(project_states: { verification_checksum: nil }) } scope :not_checksummed, -> { joins(:project_state).where(project_states: { verification_checksum: nil }) } scope :with_verification_state, ->(state) { joins(:project_state) .where(project_states: { verification_state: verification_state_value(state) }) } def verification_state_object project_state end ... class_methods do extend ::Gitlab::Utils::Override ... # @param primary_key_in [Range, Project] arg to pass to primary_key_in scope # @return [ActiveRecord::Relation<Project>] everything that should be synced # to this node, restricted by primary key def replicables_for_current_secondary(primary_key_in) # This issue template does not help you write this method. # # This method is called only on Geo secondary sites. It is called when # we want to know which records to replicate. This is not easy to automate # because for example: # # * The "selective sync" feature allows admins to choose which namespaces # to replicate, per secondary site. Most Models are scoped to a # namespace, but the nature of the relationship to a namespace varies # between Models. # * The "selective sync" feature allows admins to choose which shards to # replicate, per secondary site. Repositories are associated with # shards. Most blob types are not, but Project Uploads are. # * Remote stored replicables are not replicated, by default. But the # setting `sync_object_storage` enables replication of remote stored # replicables. # # Search the codebase for examples, and consult a Geo expert if needed. end override :verification_state_table_class def verification_state_table_class ProjectState end end # Geo checks this method in FrameworkRepositorySyncService to avoid # snapshotting repositories using object pools def pool_repository nil end def project_state super || build_project_state end ... end
- Include
-
Implement Project.replicables_for_current_secondary
above. -
Ensure Project.replicables_for_current_secondary
is well-tested. Search the codebase forreplicables_for_current_secondary
to find examples of parameterized table specs. You may need to add moreFactoryBot
traits. -
Add the following shared examples to ee/spec/models/ee/project_spec.rb
:include_examples 'a replicable model with a separate table for verification state' do let(:verifiable_model_record) { build(:project) } # add extra params if needed to make sure the record is in `Geo::ReplicableModel.verifiables` scope let(:unverifiable_model_record) { build(:project) } # add extra params if needed to make sure the record is NOT included in `Geo::ReplicableModel.verifiables` scope end
-
Create ee/app/replicators/geo/project_replicator.rb
. Implement the#repository
method which should return a<Repository>
instance, and implement the class method.model
to return theProject
class:# frozen_string_literal: true module Geo class ProjectReplicator < Gitlab::Geo::Replicator include ::Geo::RepositoryReplicatorStrategy extend ::Gitlab::Utils::Override def self.model ::Project end def self.git_access_class ::Gitlab::GitAccessProject end def self.no_repo_message git_access_class.error_message(:no_repo) end override :verification_feature_flag_enabled? def self.verification_feature_flag_enabled? # We are adding verification at the same time as replication, so we # don't need to toggle verification separately from replication. When # the replication feature flag is off, then verification is also off # (see `VerifiableReplicator.verification_enabled?`) true end override :housekeeping_enabled? def self.housekeeping_enabled? # Remove this method if the new Git repository type supports git # repository housekeeping and the ::Project#git_garbage_collect_worker_klass # is implemented. If the data type requires any action to be performed # before running the housekeeping override the `before_housekeeping` method # (see `RepositoryReplicatorStrategy#before_housekeeping`) false end def repository model_record.repository end end end
-
Make sure Geo push events are created. Usually it needs some change in the app/workers/post_receive.rb
file. Example:def replicate_project_changes(project) if ::Gitlab::Geo.primary? project.replicator.handle_after_update if project end end
See
app/workers/post_receive.rb
for more examples. -
Make sure the repository removal is also handled. You may need to add something like the following in the destroy service of the repository: project.replicator.handle_after_destroy if project.repository
-
Make sure a Geo secondary site can request and download Projects on the Geo primary site. You may need to make some changes to Gitlab::GitAccessProject
. For example, see this change for Group-level Wikis. -
Make sure a Geo secondary site can replicate Projects where repository does not exist on the Geo primary site. The only way to know about this is to parse the error text. You may need to make some changes to Gitlab::ProjectReplicator.no_repo_message
to return the proper error message. For example, see this change for Group-level Wikis. -
Generate the feature flag definition files by running the feature flag commands and following the command prompts: bin/feature-flag --ee geo_project_replication --type development --group 'group::geo'
-
Add this replicator class to the method replicator_classes
inee/lib/gitlab/geo.rb
:REPLICATOR_CLASSES = [ ::Geo::PackageFileReplicator, ::Geo::ProjectReplicator ]
-
Create ee/spec/replicators/geo/project_replicator_spec.rb
and perform the necessary setup to define themodel_record
variable for the shared examples:# frozen_string_literal: true require 'spec_helper' RSpec.describe Geo::ProjectReplicator, feature_category: :geo_replication do let(:model_record) { build(:project) } include_examples 'a repository replicator' include_examples 'a verifiable replicator' end
-
Create ee/app/models/geo/project_registry.rb
:# frozen_string_literal: true module Geo class ProjectRegistry < Geo::BaseRegistry include ::Geo::ReplicableRegistry include ::Geo::VerifiableRegistry MODEL_CLASS = ::Project MODEL_FOREIGN_KEY = :project_id belongs_to :project, class_name: 'Project' end end
-
Update REGISTRY_CLASSES
inee/app/workers/geo/secondary/registry_consistency_worker.rb
. -
Add a custom factory name if needed in def model_class_factory_name
inee/spec/support/helpers/ee/geo_helpers.rb
. -
Update it 'creates missing registries for each registry class'
inee/spec/workers/geo/secondary/registry_consistency_worker_spec.rb
. -
Add project_registry
toActiveSupport::Inflector.inflections
inconfig/initializers_before_autoloader/000_inflections.rb
. -
Create ee/spec/factories/geo/project_registry.rb
:# frozen_string_literal: true FactoryBot.define do factory :geo_project_registry, class: 'Geo::ProjectRegistry' do project # This association should have data, like a file or repository state { Geo::ProjectRegistry.state_value(:pending) } trait :synced do state { Geo::ProjectRegistry.state_value(:synced) } last_synced_at { 5.days.ago } end trait :failed do state { Geo::ProjectRegistry.state_value(:failed) } last_synced_at { 1.day.ago } retry_count { 2 } retry_at { 2.hours.from_now } last_sync_failure { 'Random error' } end trait :started do state { Geo::ProjectRegistry.state_value(:started) } last_synced_at { 1.day.ago } retry_count { 0 } end trait :verification_succeeded do verification_checksum { 'e079a831cab27bcda7d81cd9b48296d0c3dd92ef' } verification_state { Geo::ProjectRegistry.verification_state_value(:verification_succeeded) } verified_at { 5.days.ago } end end end
-
Create ee/spec/models/geo/project_registry_spec.rb
:# frozen_string_literal: true require 'spec_helper' RSpec.describe Geo::ProjectRegistry, :geo, type: :model, feature_category: :geo_replication do let_it_be(:registry) { create(:geo_project_registry) } specify 'factory is valid' do expect(registry).to be_valid end include_examples 'a Geo framework registry' include_examples 'a Geo verifiable registry' end
-
Add the following to ee/spec/factories/projects.rb
:# frozen_string_literal: true FactoryBot.modify do factory :project do trait :verification_succeeded do with_file verification_checksum { 'abc' } verification_state { Project.verification_state_value(:verification_succeeded) } end trait :verification_failed do with_file verification_failure { 'Could not calculate the checksum' } verification_state { Project.verification_state_value(:verification_failed) } end end end
If there is not an existing factory for the object in
spec/factories/projects.rb
, wrap the traits inFactoryBot.create
instead ofFactoryBot.modify
. -
Make sure the factory also allows setting a project
attribute. If the model does not have a direct relation to a project, you can use atransient
attribute. Check outspec/factories/merge_request_diffs.rb
for an example. -
Following the example of Merge Request Diffs add a Geo::ProjectState
model inee/app/models/geo/project_state.rb
:# frozen_string_literal: true module Geo class ProjectState < ApplicationRecord include ::Geo::VerificationStateDefinition self.primary_key = :project_id belongs_to :project, inverse_of: :project_state validates :verification_failure, length: { maximum: 255 } validates :verification_state, :project, presence: true end end
-
Add a factory
forproject_state
, inee/spec/factories/geo/project_states.rb
:# frozen_string_literal: true FactoryBot.define do factory :geo_project_state, class: 'Geo::ProjectState' do project trait :checksummed do verification_checksum { 'abc' } end trait :checksum_failure do verification_failure { 'Could not calculate the checksum' } end end end
-
Add [:geo_project_state, any]
toskipped
inspec/models/factories_spec.rb
Step 2. Implement metrics gathering
Metrics are gathered by Geo::MetricsUpdateWorker
, persisted in GeoNodeStatus
for display in the UI, and sent to Prometheus:
-
Add the following fields to Geo Node Status example responses in doc/api/geo_nodes.md
:projects_count
projects_checksum_total_count
projects_checksummed_count
projects_checksum_failed_count
projects_synced_count
projects_failed_count
projects_registry_count
projects_verification_total_count
projects_verified_count
projects_verification_failed_count
projects_synced_in_percentage
projects_verified_in_percentage
-
Add the same fields to GET /geo_nodes/status
example response inee/spec/fixtures/api/schemas/public_api/v4/geo_node_status.json
. -
Add the following fields to the Sidekiq metrics
table indoc/administration/monitoring/prometheus/gitlab_metrics.md
:| `geo_projects` | Gauge | XX.Y | Number of Projects on primary | `url` | | `geo_projects_checksum_total` | Gauge | XX.Y | Number of Projects to checksum on primary | `url` | | `geo_projects_checksummed` | Gauge | XX.Y | Number of Projects that successfully calculated the checksum on primary | `url` | | `geo_projects_checksum_failed` | Gauge | XX.Y | Number of Projects that failed to calculate the checksum on primary | `url` | | `geo_projects_synced` | Gauge | XX.Y | Number of syncable Projects synced on secondary | `url` | | `geo_projects_failed` | Gauge | XX.Y | Number of syncable Projects failed to sync on secondary | `url` | | `geo_projects_registry` | Gauge | XX.Y | Number of Projects in the registry | `url` | | `geo_projects_verification_total` | Gauge | XX.Y | Number of Projects to attempt to verify on secondary | `url` | | `geo_projects_verified` | Gauge | XX.Y | Number of Projects successfully verified on secondary | `url` | | `geo_projects_verification_failed` | Gauge | XX.Y | Number of Projects that failed verification on secondary | `url` |
Project replication and verification metrics should now be available in the API, the Admin > Geo > Sites
view, and Prometheus.
Step 3. Implement the GraphQL API
The GraphQL API is used by Admin > Geo > Replication Details
views, and is directly queryable by administrators.
-
Add a new field to GeoNodeType
inee/app/graphql/types/geo/geo_node_type.rb
:field :project_registries, ::Types::Geo::ProjectRegistryType.connection_type, null: true, resolver: ::Resolvers::Geo::ProjectRegistriesResolver, description: 'Find Project registries on this Geo node. '\ 'Ignored if `geo_project_replication` feature flag is disabled.', alpha: { milestone: '15.5' } # Update the milestone
-
Add the new project_registries
field name to theexpected_fields
array inee/spec/graphql/types/geo/geo_node_type_spec.rb
. -
Create ee/app/graphql/resolvers/geo/project_registries_resolver.rb
:# frozen_string_literal: true module Resolvers module Geo class ProjectRegistriesResolver < BaseResolver type ::Types::Geo::GeoNodeType.connection_type, null: true include RegistriesResolver end end end
-
Create ee/spec/graphql/resolvers/geo/project_registries_resolver_spec.rb
:# frozen_string_literal: true require 'spec_helper' RSpec.describe Resolvers::Geo::ProjectRegistriesResolver, feature_category: :geo_replication do it_behaves_like 'a Geo registries resolver', :geo_project_registry end
-
Create ee/app/finders/geo/project_registry_finder.rb
:# frozen_string_literal: true module Geo class ProjectRegistryFinder include FrameworkRegistryFinder end end
-
Create ee/spec/finders/geo/project_registry_finder_spec.rb
:# frozen_string_literal: true require 'spec_helper' RSpec.describe Geo::ProjectRegistryFinder, feature_category: :geo_replication do it_behaves_like 'a framework registry finder', :geo_project_registry end
-
Create ee/app/graphql/types/geo/project_registry_type.rb
:# frozen_string_literal: true module Types module Geo # rubocop:disable Graphql/AuthorizeTypes because it is included class ProjectRegistryType < BaseObject graphql_name 'ProjectRegistry' include ::Types::Geo::RegistryType description 'Represents the Geo replication and verification state of a project' field :project_id, GraphQL::Types::ID, null: false, description: 'ID of the Project.' end # rubocop:enable Graphql/AuthorizeTypes end end
-
Create ee/spec/graphql/types/geo/project_registry_type_spec.rb
:# frozen_string_literal: true require 'spec_helper' RSpec.describe GitlabSchema.types['ProjectRegistry'], feature_category: :geo_replication do it_behaves_like 'a Geo registry type' it 'has the expected fields (other than those included in RegistryType)' do expected_fields = %i[project_id] expect(described_class).to have_graphql_fields(*expected_fields).at_least end end
-
Add integration tests for providing Project registry data to the frontend via the GraphQL API, by duplicating and modifying the following shared examples in ee/spec/requests/api/graphql/geo/registries_spec.rb
:it_behaves_like 'gets registries for', { field_name: 'projectRegistries', registry_class_name: 'ProjectRegistry', registry_factory: :geo_project_registry, registry_foreign_key_field_name: 'projectId' }
-
Update the GraphQL reference documentation: bundle exec rake gitlab:graphql:compile_docs
Individual Project replication and verification data should now be available via the GraphQL API.
Step 4. Handle batch destroy
If batch destroy logic is implemented for a replicable, then that logic must be "replicated" by Geo secondaries. The easiest way to do this is use Geo::BatchEventCreateWorker
to bulk insert a delete event for each replicable.
For example, if FastDestroyAll
is used, then you may be able to use begin_fast_destroy
and finalize_fast_destroy
hooks, like we did for uploads.
Or if a special service is used to batch delete records and their associated data, then you probably need to hook into that service, like we did for job artifacts.
As illustrated by the above two examples, batch destroy logic cannot be handled automatically by Geo secondaries without restricting the way other teams perform batch destroys. It is up to you to produce Geo::BatchEventCreateWorker
attributes before the records are deleted, and then enqueue Geo::BatchEventCreateWorker
after the records are deleted.
-
Ensure that any batch destroy of this replicable is replicated to secondary sites -
Regardless of implementation details, please verify in specs that when the parent object is removed, the new Geo::Event
records are created:
describe '#destroy' do
subject { create(:project) }
context 'when running in a Geo primary node' do
let_it_be(:primary) { create(:geo_node, :primary) }
let_it_be(:secondary) { create(:geo_node) }
it 'logs an event to the Geo event log when bulk removal is used', :sidekiq_inline do
stub_current_geo_node(primary)
expect { subject.project.destroy! }.to change(Geo::Event.where(replicable_name: :project, event_name: :deleted), :count).by(1)
payload = Geo::Event.where(replicable_name: :project, event_name: :deleted).last.payload
expect(payload['model_record_id']).to eq(subject.id)
expect(payload['blob_path']).to eq(subject.relative_path)
expect(payload['uploader_class']).to eq('ProjectUploader')
end
end
end
Code Review
When requesting review from database reviewers:
-
Include a comment mentioning that the change is based on a documented template. -
replicables_for_current_secondary
andavailable_replicables
may differ per Model. If their queries are new, then add query plans to the MR description. An easy place to gather SQL queries is your GDK'slog/test.log
when running tests of these methods.
Release Geo support of Projects
-
In the rollout issue you created when creating the feature flag, modify the Roll Out Steps: -
Cross out any steps related to testing on production GitLab.com, because Geo is not running on production GitLab.com at the moment. -
Add a step to Test replication and verification of Projects on a non-GDK-deployment. For example, using GitLab Environment Toolkit
. -
Add a step to Ping the Geo PM and EM to coordinate testing
. For example, you might add steps to generate Projects, and then a Geo engineer may take it from there.
-
-
In ee/config/feature_flags/development/geo_project_replication.yml
, setdefault_enabled: true
-
In ee/app/graphql/types/geo/geo_node_type.rb
, remove thealpha
option for the released type:field :project_registries, ::Types::Geo::ProjectRegistryType.connection_type, null: true, resolver: ::Resolvers::Geo::ProjectRegistriesResolver, description: 'Find Project registries on this Geo node. '\ 'Ignored if `geo_project_replication` feature flag is disabled.', alpha: { milestone: '15.5' } # Update the milestone
-
Run bundle exec rake gitlab:graphql:compile_docs
after the step above to regenerate the GraphQL docs. -
Add a row for Projects to the Data types
table in Geo data types support -
Add a row for Projects to the Limitations on replication/verification
table in Geo data types support. If the row already exists, then update it to show that Replication and Verification is released in the current version.