Skip to content

Geo - Make registry table SSOT for Uploads

What does this MR do?

Make the file_registry table on the tracking database the SSOT to sync uploads.

Database queries

  • Geo::AttachmentRegistryFinder#find_never_synced_registries:

    SELECT
      "file_registry".*
    FROM
      "file_registry"
    WHERE
      "file_registry"."success" = FALSE
      AND "file_registry"."retry_count" IS NULL 
      AND 1 = 1
    LIMIT 1000;

    Query plan: https://explain.depesz.com/s/vPtH (Staging - There is no production environment)

  • Geo::AttachmentRegistryFinder#find_registry_differences - The following queries assume !24891 (merged) is merged. So they do not have this condition which would cause the queries to be much worse: OR "lfs_objects"."file_store" IS NULL:

    • No selective sync, and object storage is not synced

      SELECT
        "uploads"."id",
        "uploads"."uploader"
      FROM
        "uploads"
      WHERE
        "uploads"."store" = 1
        AND "uploads"."id" BETWEEN 1 AND 1000;
      Time: 0.258 ms
        - planning: 0.197 ms
        - execution: 0.061 ms
          - I/O read: 0.000 ms
          - I/O write: 0.000 ms
      
      Shared buffers:
        - hits: 3 (~24.00 KiB) from the buffer pool
        - reads: 0 from the OS file cache, including disk I/O
        - dirtied: 0
        - writes: 0

      Query plan: https://explain.depesz.com/s/W4J

      SELECT
        "file_registry"."file_id",
        "file_registry"."file_type"
      FROM
        "file_registry"
      WHERE
        "file_registry"."file_id" BETWEEN 1 AND 1000;

      Query plan: https://explain.depesz.com/s/vSKy (Staging - There is no production environment)

    • With selective sync by shard, and object storage is not synced

      SELECT
        "uploads"."id",
        "uploads"."uploader"
      FROM
        "uploads"
      WHERE (("uploads"."model_type" = 'Namespace'
          AND "uploads"."model_id" IN(WITH RECURSIVE "base_and_ancestors" AS ((
              SELECT
                "namespaces"."id",
                "namespaces"."parent_id" FROM "namespaces"
              WHERE
                "namespaces"."id" IN(
                  SELECT
                    "projects"."namespace_id" FROM "projects"
                  WHERE
                    "projects"."repository_storage" = 'default'))
              UNION (
                SELECT
                  "namespaces"."id",
                  "namespaces"."parent_id" FROM "namespaces",
                  "base_and_ancestors"
                WHERE
                  "namespaces"."id" = "base_and_ancestors"."parent_id"))
            SELECT
              "id" FROM "base_and_ancestors" AS "namespaces")
          OR "uploads"."model_type" = 'Project'
          AND "uploads"."model_id" IN(
            SELECT
              "projects"."id" FROM "projects"
            WHERE
              "projects"."repository_storage" = 'default'))
        OR "uploads"."model_type" NOT IN('Namespace',
          'Project')
      )
      AND "uploads"."store" = 1
      AND "uploads"."id" BETWEEN 1 AND 1000;
      Time: 1.720 ms
        - planning: 1.430 ms
        - execution: 0.290 ms
          - I/O read: 0.000 ms
          - I/O write: 0.000 ms
      
      Shared buffers:
        - hits: 3 (~24.00 KiB) from the buffer pool
        - reads: 0 from the OS file cache, including disk I/O
        - dirtied: 0
        - writes: 0

      Query plan: https://explain.depesz.com/s/CCGD

      SELECT
        "file_registry"."file_id",
        "file_registry"."file_type"
      FROM
        "file_registry"
      WHERE
        "file_registry"."file_id" BETWEEN 1 AND 1000;

      Query plan: https://explain.depesz.com/s/vSKy (Staging - There is no production environment)

    • With selective sync by namespace, and object storage is not synced

      SELECT
        "uploads"."id",
        "uploads"."uploader"
      FROM
        "uploads"
      WHERE (("uploads"."model_type" = 'Namespace'
          AND "uploads"."model_id" IN(WITH RECURSIVE "base_and_descendants" AS ((
              SELECT
                "geo_node_namespace_links"."namespace_id" AS id FROM "geo_node_namespace_links"
              WHERE
                "geo_node_namespace_links"."geo_node_id" = 2)
            UNION (
              SELECT
                "namespaces"."id" FROM "namespaces",
                "base_and_descendants"
              WHERE
                "namespaces"."parent_id" = "base_and_descendants"."id"))
          SELECT
            "id" FROM "base_and_descendants" AS "namespaces")
          OR "uploads"."model_type" = 'Project'
          AND "uploads"."model_id" IN(
            SELECT
              "projects"."id" FROM "projects"
            WHERE
              "projects"."namespace_id" IN(WITH RECURSIVE "base_and_descendants" AS ((
                  SELECT
                    "geo_node_namespace_links"."namespace_id" AS id FROM "geo_node_namespace_links"
                  WHERE
                    "geo_node_namespace_links"."geo_node_id" = 2)
                UNION (
                  SELECT
                    "namespaces"."id" FROM "namespaces",
                    "base_and_descendants"
                  WHERE
                    "namespaces"."parent_id" = "base_and_descendants"."id"))
              SELECT
                "id" FROM "base_and_descendants" AS "namespaces")))
        OR "uploads"."model_type" NOT IN('Namespace',
          'Project')
      )
      AND "uploads"."store" = 1
      AND "uploads"."id" BETWEEN 1 AND 1000;
      Time: 1.722 ms
        - planning: 1.378 ms
        - execution: 0.344 ms
          - I/O read: 0.000 ms
          - I/O write: 0.000 ms
      
      Shared buffers:
        - hits: 3 (~24.00 KiB) from the buffer pool
        - reads: 0 from the OS file cache, including disk I/O
        - dirtied: 0
        - writes: 0

      Query plan: https://explain.depesz.com/s/DlZg (The subquery for geo_node_namespace_links was replaced with a query for the gitlab-org namespace, since we don't have any geo_nodes in production)

      SELECT
        "file_registry"."file_id",
        "file_registry"."file_type"
      FROM
        "file_registry"
      WHERE
        "file_registry"."file_id" BETWEEN 1 AND 1000;

      Query plan: https://explain.depesz.com/s/vSKy (Staging - There is no production environment)

    • No selective sync, and object storage is synced

      SELECT
        "uploads"."id",
        "uploads"."uploader"
      FROM
        "uploads"
      WHERE
        "uploads"."id" BETWEEN 1 AND 1000;
      Time: 0.181 ms
        - planning: 0.154 ms
        - execution: 0.027 ms
          - I/O read: 0.000 ms
          - I/O write: 0.000 ms
      
      Shared buffers:
        - hits: 3 (~24.00 KiB) from the buffer pool
        - reads: 0 from the OS file cache, including disk I/O
        - dirtied: 0
        - writes: 0

      Query plan: https://explain.depesz.com/s/7Kmc

    • With selective sync by shard, and object storage is synced

      SELECT
        "uploads"."id",
        "uploads"."uploader"
      FROM
        "uploads"
      WHERE (("uploads"."model_type" = 'Namespace'
          AND "uploads"."model_id" IN(WITH RECURSIVE "base_and_ancestors" AS ((
              SELECT
                "namespaces"."id",
                "namespaces"."parent_id" FROM "namespaces"
              WHERE
                "namespaces"."id" IN(
                  SELECT
                    "projects"."namespace_id" FROM "projects"
                  WHERE
                    "projects"."repository_storage" = 'default'))
              UNION (
                SELECT
                  "namespaces"."id",
                  "namespaces"."parent_id" FROM "namespaces",
                  "base_and_ancestors"
                WHERE
                  "namespaces"."id" = "base_and_ancestors"."parent_id"))
            SELECT
              "id" FROM "base_and_ancestors" AS "namespaces")
          OR "uploads"."model_type" = 'Project'
          AND "uploads"."model_id" IN(
            SELECT
              "projects"."id" FROM "projects"
            WHERE
              "projects"."repository_storage" = 'default'))
        OR "uploads"."model_type" NOT IN('Namespace',
          'Project')
      )
      AND "uploads"."id" BETWEEN 1 AND 1000;
      Time: 1.639 ms
        - planning: 1.326 ms
        - execution: 0.313 ms
          - I/O read: 0.000 ms
          - I/O write: 0.000 ms
      
      Shared buffers:
        - hits: 3 (~24.00 KiB) from the buffer pool
        - reads: 0 from the OS file cache, including disk I/O
        - dirtied: 0
        - writes: 0

      Query plan: https://explain.depesz.com/s/iXmD

      SELECT
        "file_registry"."file_id",
        "file_registry"."file_type"
      FROM
        "file_registry"
      WHERE
        "file_registry"."file_id" BETWEEN 1 AND 1000;

      Query plan: https://explain.depesz.com/s/vSKy (Staging - There is no production environment)

    • With selective sync by namespace, and object storage is synced

      SELECT
        "uploads"."id",
        "uploads"."uploader"
      FROM
        "uploads"
      WHERE (("uploads"."model_type" = 'Namespace'
          AND "uploads"."model_id" IN(WITH RECURSIVE "base_and_descendants" AS ((
              SELECT
                "geo_node_namespace_links"."namespace_id" AS id FROM "geo_node_namespace_links"
              WHERE
                "geo_node_namespace_links"."geo_node_id" = 2)
            UNION (
              SELECT
                "namespaces"."id" FROM "namespaces",
                "base_and_descendants"
              WHERE
                "namespaces"."parent_id" = "base_and_descendants"."id"))
          SELECT
            "id" FROM "base_and_descendants" AS "namespaces")
          OR "uploads"."model_type" = 'Project'
          AND "uploads"."model_id" IN(
            SELECT
              "projects"."id" FROM "projects"
            WHERE
              "projects"."namespace_id" IN(WITH RECURSIVE "base_and_descendants" AS ((
                  SELECT
                    "geo_node_namespace_links"."namespace_id" AS id FROM "geo_node_namespace_links"
                  WHERE
                    "geo_node_namespace_links"."geo_node_id" = 2)
                UNION (
                  SELECT
                    "namespaces"."id" FROM "namespaces",
                    "base_and_descendants"
                  WHERE
                    "namespaces"."parent_id" = "base_and_descendants"."id"))
              SELECT
                "id" FROM "base_and_descendants" AS "namespaces")))
        OR "uploads"."model_type" NOT IN('Namespace',
          'Project')
      )
      AND "uploads"."id" BETWEEN 1 AND 1000;
      Time: 1.426 ms
        - planning: 1.223 ms
        - execution: 0.203 ms
          - I/O read: 0.000 ms
          - I/O write: 0.000 ms
      
      Shared buffers:
        - hits: 3 (~24.00 KiB) from the buffer pool
        - reads: 0 from the OS file cache, including disk I/O
        - dirtied: 0
        - writes: 0

      Query plan: https://explain.depesz.com/s/wrPH (The subquery for geo_node_namespace_links was replaced with a query for the gitlab-org namespace, since we don't have any geo_nodes in production)

      SELECT
        "file_registry"."file_id",
        "file_registry"."file_type"
      FROM
        "file_registry"
      WHERE
        "file_registry"."file_id" BETWEEN 1 AND 1000;

      Query plan: https://explain.depesz.com/s/vSKy (Staging - There is no production environment)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team

Related issues

#206946 (closed)

Edited by Mayra Cabrera

Merge request reports