Skip to content

Geo: Make registry table SSOT for LFS objects

Michael Kozono requested to merge mk/registry-is-ssot-for-lfs into master

What does this MR do?

This MR depends on !22031 (merged). We should target master after it is merged.

LFS_backfill

Registry as SSOT for LFS draw.io source

#34269 (closed)

Database queries

LoopingBatcher#get_finish

SELECT MAX(subset.id) AS finish,
       EXISTS(SELECT id FROM lfs_objects WHERE id > MAX(subset.id)) AS last_id
  FROM (SELECT id
          FROM lfs_objects
         WHERE (id >= 7000000)
      ORDER BY id
         LIMIT 1000) AS subset;

Query plan: https://explain.depesz.com/s/fzky

Time: 19.702 ms
  - planning: 0.855 ms
  - execution: 18.847 ms
    - I/O read: 17.662 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 7 (~56.00 KiB) from the buffer pool
  - reads: 19 (~152.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Geo::LfsObjectRegistryFinder#find_never_synced_registries

SELECT "lfs_object_registry"."lfs_object_id"
FROM "lfs_object_registry"
WHERE "lfs_object_registry"."success" = $1
  AND "lfs_object_registry"."retry_count" IS NULL
  AND 1=1
LIMIT $2;

Query plan: E_NO_PROD_LIKE_ENV_TO_TEST

Geo::LfsObjectRegistryFinder#find_registry_differences

The following queries assume !24891 (merged) is merged. So they do not have this condition which would cause the queries to be much worse: OR "lfs_objects"."file_store" IS NULL.

No selective sync, and object storage is not synced

SELECT "lfs_objects"."id"
FROM "lfs_objects"
WHERE "lfs_objects"."file_store" = 1
AND "lfs_objects"."id" BETWEEN 1 AND 1000

Query plan: https://explain.depesz.com/s/O61q

Time: 0.239 ms
  - planning: 0.192 ms
  - execution: 0.047 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 4 (~32.00 KiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

With selective sync by shard, and object storage is not synced

WITH "restricted_lfs_objects" AS (
  SELECT "lfs_objects_projects"."lfs_object_id" 
  FROM "lfs_objects_projects" 
  WHERE "lfs_objects_projects"."project_id" IN (
    SELECT "projects"."id" 
    FROM "projects" 
    WHERE "projects"."repository_storage" = 'default')) 
SELECT id 
FROM "restricted_lfs_objects" 
INNER JOIN "lfs_objects" ON "restricted_lfs_objects"."lfs_object_id" = "lfs_objects"."id" 
WHERE "lfs_objects"."file_store" = 1 
AND "lfs_objects"."id" BETWEEN 1 AND 1000

Query plan: https://explain.depesz.com/s/EWLs

Time: 1.481 ms
  - planning: 0.844 ms
  - execution: 0.637 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 593 (~4.60 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

With selective sync by namespace, and object storage is not synced

I replaced the subquery for geo_node_namespace_links with a query for the gitlab-org namespace, since we don't have any geo_nodes in production.

WITH "restricted_lfs_objects" AS (
  SELECT "lfs_objects_projects"."lfs_object_id" 
  FROM "lfs_objects_projects" 
  WHERE "lfs_objects_projects"."project_id" IN (
    SELECT "projects"."id" 
    FROM "projects" 
    WHERE "projects"."namespace_id" IN (
      WITH RECURSIVE "base_and_descendants" AS (
        (
          SELECT "geo_node_namespace_links"."namespace_id" AS id 
          FROM "geo_node_namespace_links" 
          WHERE "geo_node_namespace_links"."geo_node_id" = 4)
        UNION
        (
          SELECT "namespaces"."id" 
          FROM "namespaces", "base_and_descendants" 
          WHERE "namespaces"."parent_id" = "base_and_descendants"."id")) 
      SELECT "id" FROM "base_and_descendants" AS "namespaces"))) 
SELECT id 
FROM "restricted_lfs_objects" 
INNER JOIN "lfs_objects" ON "restricted_lfs_objects"."lfs_object_id" = "lfs_objects"."id" 
WHERE "lfs_objects"."file_store" = 1 
AND "lfs_objects"."id" BETWEEN 1 AND 1000

Query plan: https://explain.depesz.com/s/5Q1D

Time: 1.565 ms
  - planning: 1.367 ms
  - execution: 0.198 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 5 (~40.00 KiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

No selective sync, and object storage is synced

SELECT "lfs_objects"."id"
FROM "lfs_objects"
AND "lfs_objects"."id" BETWEEN 1 AND 1000

Query plan: https://explain.depesz.com/s/QC9s

Time: 7.231 ms
  - planning: 0.126 ms
  - execution: 7.105 ms
    - I/O read: 6.759 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 101 (~808.00 KiB) from the buffer pool
  - reads: 5 (~40.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

With selective sync by shard, and object storage is synced

WITH "restricted_lfs_objects" AS (
  SELECT "lfs_objects_projects"."lfs_object_id" 
  FROM "lfs_objects_projects" 
  WHERE "lfs_objects_projects"."project_id" IN (
    SELECT "projects"."id" 
    FROM "projects" 
    WHERE "projects"."repository_storage" = 'default')) 
SELECT id 
FROM "restricted_lfs_objects" 
INNER JOIN "lfs_objects" ON "restricted_lfs_objects"."lfs_object_id" = "lfs_objects"."id" 
WHERE "lfs_objects"."id" BETWEEN 1 AND 1000

Query plan: https://explain.depesz.com/s/vaM5v

Time: 303.645 ms
  - planning: 0.805 ms
  - execution: 302.840 ms
    - I/O read: 298.549 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 420 (~3.30 MiB) from the buffer pool
  - reads: 173 (~1.40 MiB) from the OS file cache, including disk I/O
  - dirtied: 3 (~24.00 KiB)
  - writes: 0

With selective sync by namespace, and object storage is synced

I replaced the subquery for geo_node_namespace_links with a query for the gitlab-org namespace, since we don't have any geo_nodes in production.

WITH "restricted_lfs_objects" AS (
  SELECT "lfs_objects_projects"."lfs_object_id" 
  FROM "lfs_objects_projects" 
  WHERE "lfs_objects_projects"."project_id" IN (
    SELECT "projects"."id" 
    FROM "projects" 
    WHERE "projects"."namespace_id" IN (
      WITH RECURSIVE "base_and_descendants" AS (
        (
          SELECT "geo_node_namespace_links"."namespace_id" AS id 
          FROM "geo_node_namespace_links" 
          WHERE "geo_node_namespace_links"."geo_node_id" = 4)
        UNION
        (
          SELECT "namespaces"."id" 
          FROM "namespaces", "base_and_descendants" 
          WHERE "namespaces"."parent_id" = "base_and_descendants"."id"))
      SELECT "id" 
      FROM "base_and_descendants" AS "namespaces")))
SELECT id 
FROM "restricted_lfs_objects" 
INNER JOIN "lfs_objects" ON "restricted_lfs_objects"."lfs_object_id" = "lfs_objects"."id" 
WHERE "lfs_objects"."id" BETWEEN 1 AND 1000

Query plan: https://explain.depesz.com/s/OedP

Time: 2.812 ms
  - planning: 1.307 ms
  - execution: 1.505 ms
    - I/O read: 1.218 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 9 (~72.00 KiB) from the buffer pool
  - reads: 1 (~8.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
SELECT "lfs_object_registry"."lfs_object_id"
FROM "lfs_object_registry"
WHERE "lfs_object_registry"."lfs_object_id" BETWEEN $1 AND $2

Query plan: E_NO_PROD_LIKE_ENV_TO_TEST

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by Michael Kozono

Merge request reports