Skip to content

Improve pipelines field in package related GraphQL types

🐞 Context

Packages can be uploaded to the Package Registry through CI pipelines. When that happen, we will save the CI Pipeline id so that the backend knows that a given package has been published through a CI job.

To support the above, we use the following models:

flowchart LR
  pkg([Packages::Package])-- 1:n -->bi([Packages::BuildInfo])
  bi-- 1:1 -->pip([Ci::Pipeline])

Because for some formats, a package can be published multiple times (duplicates), we need to support that a package can be linked with many pipelines.

During the ci_* database decomposition for packages, we noticed that we had a few problems with this intermediary model in GraphQL.

🔥 The Problem

In GraphQL, we are using the has_many :pipelines, through: :build_infos shortcut that we have in the package model and this bring problems such as the amount of build_info objects pulled to get the pipelines.

This problem is deduplicated on several access levels. In other words, we can GraphQL query a package (and the related pipelines) at different levels:

  1. for the package itself
  2. for a project which can contain many packages
  3. for a group which can contain many packages

See the related SQL queries in #338861 (comment 701147512).

We can notice several problems:

  • In some cases, we load all build_info rows and then proceed to load all the pipeline rows.
  • We have a n+1 situation at the group level.

This MR will try to solve all of these at once.

🚒 The Solution

The solution we went for is to stop using a has_many_through association in GraphQL and simply use the build_info object directly by exposing it in GraphQL.

Because we could be loading:

  • many build_info objects
  • many pipeline objects

We want to trigger as fewer SQL queries as possible (ideally, 2, one for each object type).

This points us to use a batch loader.

Because we're trying to load a has_many association for potentially several packages, loading many build_info objects for each package led us to use a LATERAL style query. In other words, we're going to load several "sets" of build_info rows in a single query using LATERAL.

Lastly, we want to handle pagination parameters properly. For example, if the query asks for the single most recent pipeline for each package, why would you load all build_info rows for each packages? We can load the most recent one for each package.

The above is what issue #347220 (closed) is.

Lastly a word on a hard limit we're going to have. Since we could load sets of build_info for many packages, we need to hard limit the size of each build_info set and since a build_info row is mapped to a single pipeline that hard limit will get applied to the final pipelines set returned.

🔬 What does this MR do and why?

  • Update the PackagePipelinesResolver to have several ways to resolve the pipelines field:
    • object_field. This will simply call package.pipelines.
    • old_finder. This will use the existing Packages::BuildInfosFinder.
    • new_finder. This will use the newly introduced Packages::BuildInfosForManyPackagesFinder.
      • This finder has been introduced in !85073 (merged)
      • This finder will return a structure that will then use the ActiveRecordArrayConnection introduced in !84387 (merged).
  • Which resolve way is executed depends on the access level and the feature flag state:
    Access level feature flag disabled (master like) feature flag enabled
    group object_field new_finder
    project object_field new_finder
    package old_finder new_finder
  • Remove the pipelines override from the PackageDetailsType.
  • Link the PackagePipelinesResolver to the PackageType.
  • Updates the related documentation. Since this MR fixes the issues with the deprecated pipelines field. As such, that field is not marked as deprecated any more.

🖼 Screenshots or screen recordings

🔥 Using master

Package level

  • GraphQL query
    {
      package(id: "gid://gitlab/Packages::Package/40") {
        id
        pipelines {
          nodes {
            id
          }
        }
      }
    }
  • SQL triggered
    SELECT "packages_build_infos"."pipeline_id" FROM "packages_build_infos" WHERE "packages_build_infos"."package_id" = 40 AND "packages_build_infos"."pipeline_id" IS NOT NULL ORDER BY "packages_build_infos"."pipeline_id" DESC LIMIT 100
    
    SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."id" IN (312, 311, 310, 309, 308) ORDER BY "ci_pipelines"."id" DESC LIMIT 100
  • We see some limits being applied. This was part of !75672 (merged)

Project level

  • GraphQL query
    {
      project(fullPath: "many/packages-1") {
        packages {
          nodes {
            pipelines {
              nodes {
                id
              }
            }
          }
        }
      }
    }
  • SQL triggered
    SELECT "packages_build_infos".* FROM "packages_build_infos" WHERE "packages_build_infos"."package_id" IN (40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26) 
    
    SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."id" IN (238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312)
  • We load all build_info rows.

Group level

  • GraphQL query
    {
      group(fullPath: "many") {
        packages {
          nodes {
            id
            pipelines {
              nodes {
                id
              }
            }
          }
        }
      }
    }
  • SQL triggered

    Many occurences of (n+1 situation):

    SELECT "packages_build_infos".* FROM "packages_build_infos" WHERE "packages_build_infos"."package_id" IN (40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26)
    
    SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."id" IN (238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312)
  • We load all build_info rows.

  • We do this for each package. (n+1 situation)

🚒 Using this MR + feature flag enabled

Package level

  • GraphQL query
    {
      package(id: "gid://gitlab/Packages::Package/40") {
        id
        pipelines {
          nodes {
            id
          }
        }
      }
    }
  • SQL triggered
    SELECT build_infos.* FROM "packages_packages", LATERAL (SELECT "packages_build_infos".* FROM "packages_build_infos" WHERE "packages_build_infos"."pipeline_id" IS NOT NULL AND "packages_build_infos"."package_id" = packages_packages.id ORDER BY "packages_build_infos"."id" DESC LIMIT 20) AS build_infos WHERE "packages_packages"."id" = 40 ORDER BY build_infos.id DESC
    
    SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."id" IN (312, 311, 310, 309, 308)
  • 2 SQL queries to load the build_infos and pipelines

  • Limits applied

  • (not shown) Pagination options (such as pipelines(first: 1)) properly applied.

    • GraphQL example
      {
        package(id: "gid://gitlab/Packages::Package/40") {
          id
          pipelines(first: 1) {
            nodes {
              id
            }
          }
        }
      }

Project level

  • GraphQL query
    {
      project(fullPath: "many/packages-1") {
        packages {
          nodes {
            pipelines {
              nodes {
                id
              }
            }
          }
        }
      }
    }
  • SQL triggered
    SELECT build_infos.* FROM "packages_packages", LATERAL (SELECT "packages_build_infos".* FROM "packages_build_infos" WHERE "packages_build_infos"."pipeline_id" IS NOT NULL AND "packages_build_infos"."package_id" = packages_packages.id ORDER BY "packages_build_infos"."id" DESC LIMIT 20) AS build_infos WHERE "packages_packages"."id" IN (30, 29, 28, 27, 26) ORDER BY build_infos.id DESC
    
    SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."id" IN (262, 261, 260, 259, 258, 257, 256, 255, 254, 253, 252, 251, 250, 249, 248, 247, 246, 245, 244, 243, 242, 241, 240, 239, 238)
  • 2 SQL queries to load the build_infos and pipelines

  • Limits applied

  • (not shown) Pagination options (such as pipelines(first: 1)) properly applied.

    • GraphQL example
      {
        project(fullPath: "many/packages-1") {
          packages {
            nodes {
              pipelines(first: 1) {
                nodes {
                  id
                }
              }
            }
          }
        }
      }

Group level

  • GraphQL query
    {
      group(fullPath: "many") {
        packages {
          nodes {
            id
            pipelines {
              nodes {
                id
              }
            }
          }
        }
      }
    }
  • SQL triggered
    SELECT build_infos.* FROM "packages_packages", LATERAL (SELECT "packages_build_infos".* FROM "packages_build_infos" WHERE "packages_build_infos"."pipeline_id" IS NOT NULL AND "packages_build_infos"."package_id" = packages_packages.id ORDER BY "packages_build_infos"."id" DESC LIMIT 20) AS build_infos WHERE "packages_packages"."id" IN (40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26) ORDER BY build_infos.id DESC
    
    SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."id" IN (312, 311, 310, 309, 308, 307, 306, 305, 304, 303, 302, 301, 300, 299, 298, 297, 296, 295, 294, 293, 292, 291, 290, 289, 288, 287, 286, 285, 284, 283, 282, 281, 280, 279, 278, 277, 276, 275, 274, 273, 272, 271, 270, 269, 268, 267, 266, 265, 264, 263, 262, 261, 260, 259, 258, 257, 256, 255, 254, 253, 252, 251, 250, 249, 248, 247, 246, 245, 244, 243, 242, 241, 240, 239, 238)
  • 2 SQL queries to load the build_infos and pipelines

  • Limits applied

  • (not shown) Pagination options (such as pipelines(first: 1)) properly applied.

Conclusions

  • The nice thing with this MR is that it handles the 3 difference access levels in the same way. One LATERAL query for build_info rows and one SELECT query for pipelines.
  • We support the pagination parameters that the pipelines field can have for the 3 levels.

How to set up and validate locally

  1. Create a Group
  2. Create several Projects
  3. For each project, open a rails console and execute:
    def fixture_file_upload(*args, **kwargs)
      Rack::Test::UploadedFile.new(*args, **kwargs)
    end
    
    pkg = FactoryBot.create(:npm_package, project: Project.find(<project_id>))
    
    # creates 5 pipelines
    5.times { Packages::BuildInfo.create!(package_id: pkg.id, pipeline_id: FactoryBot.create(:ci_pipeline).id) }
  4. Open <host>/-/graphql-explorer and play around with queries shown above.

🚦 MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by David Fernandez

Merge request reports