Skip to content

Geo: Shard file_registry by type

Right now our Geo tracking database has a file_registry table that has 7.6 million rows. Each row has a type column for the different types of attachments we want to transport.

This query is timing out on the secondary for me even with FDW enabled:

Geo::AttachmentRegistryFinder
  .new(current_node: Gitlab::Geo.current_node)
  .find_unsynced_attachments(batch_size: 1000)
  .count
SELECT COUNT(count_column)
FROM
  (SELECT 1 AS count_column
   FROM "gitlab_secondary"."uploads"
   LEFT OUTER JOIN file_registry ON file_registry.file_id = gitlab_secondary.uploads.id
   AND file_registry.file_type IN ('attachment',
                                   'avatar',
                                   'file',
                                   'namespace_file',
                                   'personal_file')
   WHERE ("gitlab_secondary"."uploads"."store" = 1
          OR "gitlab_secondary"."uploads"."store" IS NULL)
     AND "file_registry"."id" IS NULL
     AND (1=1)
   LIMIT 1000) subquery_for_count

Sentry issue: https://sentry.gitlap.com/gitlab/gitlabcom/issues/139911/

Edited by Toon Claes