Geo: Shard file_registry by type
Right now our Geo tracking database has a file_registry
table that has 7.6 million rows. Each row has a type
column for the different types of attachments we want to transport.
This query is timing out on the secondary for me even with FDW enabled:
Geo::AttachmentRegistryFinder
.new(current_node: Gitlab::Geo.current_node)
.find_unsynced_attachments(batch_size: 1000)
.count
SELECT COUNT(count_column)
FROM
(SELECT 1 AS count_column
FROM "gitlab_secondary"."uploads"
LEFT OUTER JOIN file_registry ON file_registry.file_id = gitlab_secondary.uploads.id
AND file_registry.file_type IN ('attachment',
'avatar',
'file',
'namespace_file',
'personal_file')
WHERE ("gitlab_secondary"."uploads"."store" = 1
OR "gitlab_secondary"."uploads"."store" IS NULL)
AND "file_registry"."id" IS NULL
AND (1=1)
LIMIT 1000) subquery_for_count
Sentry issue: https://sentry.gitlap.com/gitlab/gitlabcom/issues/139911/
Edited by Toon Claes