Geo: Replicate Group Uploads

What does this MR do and why?

This code change adds support for tracking and replicating group file uploads in GitLab's Geo feature (which keeps multiple GitLab instances synchronized across different locations).

The changes create a new database table called "group_upload_states" that stores information about whether group uploads have been successfully copied and verified between different GitLab sites. This includes tracking when files were last checked, whether verification passed or failed, and retry counts for failed attempts.

The code also adds the necessary database migrations to create this new table with proper indexes for efficient querying, sets up foreign key relationships to link uploads with their parent groups, and includes sharding support for better performance in large deployments.

Additionally, it updates the GraphQL API to expose information about group upload replication status, adds new monitoring metrics so administrators can track how well group uploads are being synchronized, and updates documentation to reflect these new capabilities.

This enhancement extends GitLab's existing file replication system (which already handled project uploads) to also cover files uploaded at the group level, ensuring better data consistency and backup coverage across geographically distributed GitLab installations.

References

Related to #589910 (closed)

Regarding reviews and merge process for this series

This MR is one of many instances following !224245 (merged), which was produced by the same generator script. @dbalexandre has been improving the generator script with the MR feedback, and I expect he will continue to do so.

These are all behind a feature flag, so I propose that most release-blocking comments can be handled in a follow-up, which also addresses the generator script and any previous instances.

For more context, see !226569 (comment 3152345538).

How to set up and validate locally

Prerequisites

1. Run database migrations

rails db:migrate # on the primary
rails db:migrate:geo # on the secondary

2. Enable the feature flags on the primary

# In Rails console on the primary
Feature.enable(:geo_group_upload_replication)
Feature.enable(:geo_group_upload_force_primary_checksumming)

3. Create test data on the primary

Upload a file to a group (e.g., attach an image to a group-level issue or epic description). Alternatively, use the Rails console:

# In Rails console on the primary
group = Group.first
file = CarrierWaveStringFile.new_file(
  file_content: "Seeded upload file in group #{group.full_path}",
  filename: 'seeded_upload.txt',
  content_type: 'text/plain'
)

UploadService.new(group, file, NamespaceFileUploader).execute

Verify the upload exists in the namespace_uploads partition:

Geo::GroupUpload.count
# Should be > 0

4. Verify checksumming on the primary

Wait for the verification worker to process, or trigger it manually:

# In Rails console on the primary
Geo::GroupUpload.first.replicator.verify
Geo::GroupUpload.first.group_upload_state.reload
Geo::GroupUpload.first.group_upload_state.verification_state
# Should be 2 (verification_succeeded)

5. Verify replication on the secondary

Once the upload is created on the primary, Geo will automatically replicate it to the secondary. Check the sync status in the secondary Rails console:

# In Rails console on the secondary
Geo::GroupUploadRegistry.count
# Should be > 0

registry = Geo::GroupUploadRegistry.last
registry.state
# Should be 2 (synced)

If the registry is empty or not yet synced, you can manually trigger sync:

# In Rails console on the secondary
Geo::GroupUploadReplicator.new(model_record_id: Geo::GroupUpload.first.id).sync

6. Verify verification on the secondary

# In Rails console on the secondary
registry = Geo::GroupUploadRegistry.last
registry.reload
registry.verification_state
# Should be 2 (verification_succeeded)

7. Test GraphQL API on the secondary

Note: You must be logged in as an admin user. Non-admin users will get null for Geo-related queries.

Note: When querying from the secondary's GraphQL explorer, add a custom header REQUEST_PATH with the value `/api/v4/geo/node_proxy/{node_id}/graphql

Open the GraphQL explorer on the secondary instance (http://<secondary-url>/-/graphql-explorer) and run:

query {
  geoNode {
    name
    primary
    groupUploadRegistries {
      nodes {
        id
        state
        verificationState
        groupUploadId
        lastSyncedAt
        verifiedAt
      }
    }
  }
}

Expected result: you should see registry entries with state: "SYNCED" and verificationState: "VERIFIED".

8. Verify Geo Sites API

Check the Geo Sites API includes the new group upload statistics:

curl --header "PRIVATE-TOKEN: <your-token>" "http://<primary-url>/api/v4/geo_sites/status"

Look for the new fields in the response:

  • group_uploads_count
  • group_uploads_checksummed_count
  • group_uploads_checksum_failed_count
  • group_uploads_synced_count
  • group_uploads_failed_count
  • group_uploads_registry_count
  • group_uploads_synced_in_percentage
  • group_uploads_verified_in_percentage

9. Verify Geo admin page

Visit /admin/geo/sites on the secondary and confirm that "Group Uploads" appears as a new data type with replication and verification progress.

Database Queries

  • Selective Sync Disabled:

    • Raw SQL

      Click to expand
      SELECT
          "namespace_uploads".*
      FROM
          "namespace_uploads"
      WHERE
          "namespace_uploads"."id" BETWEEN 1 AND 10000;
    • Query Plan: https://explain.depesz.com/s/nH3V

  • Selective Sync by Groups:

    • Raw SQL

      Click to expand
        SELECT
            "namespace_uploads".*
        FROM
            "namespace_uploads"
        WHERE
            "namespace_uploads"."id" BETWEEN 1 AND 10000
            AND "namespace_uploads"."namespace_id" IN ( WITH RECURSIVE "base_and_descendants" AS (
        (
                        SELECT
                            "geo_node_namespace_links"."namespace_id" AS id
                        FROM
                            "geo_node_namespace_links"
                        WHERE
                            "geo_node_namespace_links"."geo_node_id" = 2)
                    UNION (
                        SELECT
                            "namespaces"."id"
                        FROM
                            "namespaces",
                            "base_and_descendants"
                        WHERE
                            "namespaces"."parent_id" = "base_and_descendants"."id"))
                    SELECT
                        "id"
                    FROM
                        "base_and_descendants" AS "namespaces");
    • Query Plan: https://explain.depesz.com/s/62pU

  • Selective Sync by Organizations:

    • Raw SQL

      Click to expand
      SELECT
          "namespace_uploads".*
      FROM
          "namespace_uploads"
      WHERE
          "namespace_uploads"."id" BETWEEN 1 AND 10000
          AND "namespace_uploads"."namespace_id" IN (
              SELECT
                  "namespaces"."id"
              FROM
                  "namespaces"
              WHERE
                  "namespaces"."organization_id" IN (
                      SELECT
                          "organizations"."id"
                      FROM
                          "organizations"
                          INNER JOIN "geo_node_organization_links" ON "organizations"."id" = "geo_node_organization_links"."organization_id"
                      WHERE
                          "geo_node_organization_links"."geo_node_id" = 2));
    • Query Plan: https://explain.depesz.com/s/nyun

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Douglas Barbosa Alexandre

Merge request reports

Loading