Geo: Make registry table SSOT for LFS objects
What does this MR do?
This MR depends on !22031 (merged). We should target master after it is merged.
Registry as SSOT for LFS draw.io source
Database queries
LoopingBatcher#get_finish
SELECT MAX(subset.id) AS finish,
EXISTS(SELECT id FROM lfs_objects WHERE id > MAX(subset.id)) AS last_id
FROM (SELECT id
FROM lfs_objects
WHERE (id >= 7000000)
ORDER BY id
LIMIT 1000) AS subset;
Query plan: https://explain.depesz.com/s/fzky
Time: 19.702 ms
- planning: 0.855 ms
- execution: 18.847 ms
- I/O read: 17.662 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 7 (~56.00 KiB) from the buffer pool
- reads: 19 (~152.00 KiB) from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
Geo::LfsObjectRegistryFinder#find_never_synced_registries
SELECT "lfs_object_registry"."lfs_object_id"
FROM "lfs_object_registry"
WHERE "lfs_object_registry"."success" = $1
AND "lfs_object_registry"."retry_count" IS NULL
AND 1=1
LIMIT $2;
Query plan: E_NO_PROD_LIKE_ENV_TO_TEST
Geo::LfsObjectRegistryFinder#find_registry_differences
The following queries assume !24891 (merged) is merged. So they do not have this condition which would cause the queries to be much worse: OR "lfs_objects"."file_store" IS NULL
.
No selective sync, and object storage is not synced
SELECT "lfs_objects"."id"
FROM "lfs_objects"
WHERE "lfs_objects"."file_store" = 1
AND "lfs_objects"."id" BETWEEN 1 AND 1000
Query plan: https://explain.depesz.com/s/O61q
Time: 0.239 ms
- planning: 0.192 ms
- execution: 0.047 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 4 (~32.00 KiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
With selective sync by shard, and object storage is not synced
WITH "restricted_lfs_objects" AS (
SELECT "lfs_objects_projects"."lfs_object_id"
FROM "lfs_objects_projects"
WHERE "lfs_objects_projects"."project_id" IN (
SELECT "projects"."id"
FROM "projects"
WHERE "projects"."repository_storage" = 'default'))
SELECT id
FROM "restricted_lfs_objects"
INNER JOIN "lfs_objects" ON "restricted_lfs_objects"."lfs_object_id" = "lfs_objects"."id"
WHERE "lfs_objects"."file_store" = 1
AND "lfs_objects"."id" BETWEEN 1 AND 1000
Query plan: https://explain.depesz.com/s/EWLs
Time: 1.481 ms
- planning: 0.844 ms
- execution: 0.637 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 593 (~4.60 MiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
With selective sync by namespace, and object storage is not synced
I replaced the subquery for geo_node_namespace_links
with a query for the gitlab-org
namespace, since we don't have any geo_nodes
in production.
WITH "restricted_lfs_objects" AS (
SELECT "lfs_objects_projects"."lfs_object_id"
FROM "lfs_objects_projects"
WHERE "lfs_objects_projects"."project_id" IN (
SELECT "projects"."id"
FROM "projects"
WHERE "projects"."namespace_id" IN (
WITH RECURSIVE "base_and_descendants" AS (
(
SELECT "geo_node_namespace_links"."namespace_id" AS id
FROM "geo_node_namespace_links"
WHERE "geo_node_namespace_links"."geo_node_id" = 4)
UNION
(
SELECT "namespaces"."id"
FROM "namespaces", "base_and_descendants"
WHERE "namespaces"."parent_id" = "base_and_descendants"."id"))
SELECT "id" FROM "base_and_descendants" AS "namespaces")))
SELECT id
FROM "restricted_lfs_objects"
INNER JOIN "lfs_objects" ON "restricted_lfs_objects"."lfs_object_id" = "lfs_objects"."id"
WHERE "lfs_objects"."file_store" = 1
AND "lfs_objects"."id" BETWEEN 1 AND 1000
Query plan: https://explain.depesz.com/s/5Q1D
Time: 1.565 ms
- planning: 1.367 ms
- execution: 0.198 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 5 (~40.00 KiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
No selective sync, and object storage is synced
SELECT "lfs_objects"."id"
FROM "lfs_objects"
AND "lfs_objects"."id" BETWEEN 1 AND 1000
Query plan: https://explain.depesz.com/s/QC9s
Time: 7.231 ms
- planning: 0.126 ms
- execution: 7.105 ms
- I/O read: 6.759 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 101 (~808.00 KiB) from the buffer pool
- reads: 5 (~40.00 KiB) from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
With selective sync by shard, and object storage is synced
WITH "restricted_lfs_objects" AS (
SELECT "lfs_objects_projects"."lfs_object_id"
FROM "lfs_objects_projects"
WHERE "lfs_objects_projects"."project_id" IN (
SELECT "projects"."id"
FROM "projects"
WHERE "projects"."repository_storage" = 'default'))
SELECT id
FROM "restricted_lfs_objects"
INNER JOIN "lfs_objects" ON "restricted_lfs_objects"."lfs_object_id" = "lfs_objects"."id"
WHERE "lfs_objects"."id" BETWEEN 1 AND 1000
Query plan: https://explain.depesz.com/s/vaM5v
Time: 303.645 ms
- planning: 0.805 ms
- execution: 302.840 ms
- I/O read: 298.549 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 420 (~3.30 MiB) from the buffer pool
- reads: 173 (~1.40 MiB) from the OS file cache, including disk I/O
- dirtied: 3 (~24.00 KiB)
- writes: 0
With selective sync by namespace, and object storage is synced
I replaced the subquery for geo_node_namespace_links
with a query for the gitlab-org
namespace, since we don't have any geo_nodes
in production.
WITH "restricted_lfs_objects" AS (
SELECT "lfs_objects_projects"."lfs_object_id"
FROM "lfs_objects_projects"
WHERE "lfs_objects_projects"."project_id" IN (
SELECT "projects"."id"
FROM "projects"
WHERE "projects"."namespace_id" IN (
WITH RECURSIVE "base_and_descendants" AS (
(
SELECT "geo_node_namespace_links"."namespace_id" AS id
FROM "geo_node_namespace_links"
WHERE "geo_node_namespace_links"."geo_node_id" = 4)
UNION
(
SELECT "namespaces"."id"
FROM "namespaces", "base_and_descendants"
WHERE "namespaces"."parent_id" = "base_and_descendants"."id"))
SELECT "id"
FROM "base_and_descendants" AS "namespaces")))
SELECT id
FROM "restricted_lfs_objects"
INNER JOIN "lfs_objects" ON "restricted_lfs_objects"."lfs_object_id" = "lfs_objects"."id"
WHERE "lfs_objects"."id" BETWEEN 1 AND 1000
Query plan: https://explain.depesz.com/s/OedP
Time: 2.812 ms
- planning: 1.307 ms
- execution: 1.505 ms
- I/O read: 1.218 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 9 (~72.00 KiB) from the buffer pool
- reads: 1 (~8.00 KiB) from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
SELECT "lfs_object_registry"."lfs_object_id"
FROM "lfs_object_registry"
WHERE "lfs_object_registry"."lfs_object_id" BETWEEN $1 AND $2
Query plan: E_NO_PROD_LIKE_ENV_TO_TEST
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. - [-] Tested in all supported browsers
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team