Skip to content

Remove package/ci table cross joins

Steve Abrams requested to merge 338861-pipelines-has-many-remove into master

👺 Context

Packages can be pushed to the GitLab Packages Registry from a CI job. When that happens, the registry will store a reference to the pipeline object for tracking and UI purposes.

Here are the model classes

flowchart LR
    pp[::Packages::Package] -- 1:n --> ppf[::Packages::PackageFile]
    pp -- 1:n --> bi[::Packages::BuildInfo]
    ppf -- 1:n --> pfbi[::Packages::PackageFileBuildInfo]
    bi -- n:1 --> pip[::Ci::Pipeline]
    pfbi -- n:1 --> pip
  • build_info objects are really simply: a package_id or package_file_id and a pipeline_id. That's it.
  • Both package models have a has_many :pipelines, through: :build_infos that allows us to access the pipelines directly without programmatically access the build_infos first
  • A package can have many (unbounded) build_info objects
  • A package file can have many too but in practice, it's always single one because a given package file is uploaded only once whereas a package can accept duplicated uploads and we could create a build_info on each upload.

Those has_many :pipelines, through: :build_infos statements can potentially lead to cross join queries with the pipelines table (ci_pipelines). Those queries need to be removed as part of the database decomposition.

The package effort in that aspect is tracked in #338861 (closed).

🚀 ~performance expectations

During the analysis of these cross joins queries, it has been discovered that many APIs have ~performance issues ranging from loading unbounded amounts of objects to n+1 queries.

The goal here is to remove the cross joins and follow up on adding limits to these queries where applicable in other issues: #289956 (closed), #342192, and #341950 (closed). For example, it is known that the Rest API GET /api/v4/projects/166/packages loads an unbounded amount of pipelines for each package found (20 packages returned max).

In other words, the scope of this MR will strictly be limited to: removing cross join queries with ci_tables.

🔀 Cross joins queries

As stated in the analysis, we have two cases that raise cross join queries with a ci_* table:

  1. Rest API GET /api/v4/projects/166/packages/567/package_files
    • This will load the pipelines of each package file found using a cross join query.
  2. GraphQL package -> pipelines
    • This will load the pipelines of the given package using a cross join query.

In addition, we noticed that ::Packages::Package objects are usually loaded through finders that will use .includes(pipelines: :user). Currently, that .includes ends up in a .preload which in turn will load each object type (build_info and pipeline) in its own query. There is a risk here that the .includes end up using .eager_load (see https://scoutapm.com/blog/activerecord-includes-vs-joins-vs-preload-vs-eager_load-when-and-where) and that .eager_load will be a cross join query.

🔬 What does this MR do and why?

  • Replace .includes(pipelines: :user) with .preload(pipelines: :user) in the package and package file model
    • This way, we guarantee that we will never end up using .eager_load
  • Update the related finders / API to use that .preload
  • Use disable_joins: true option in the package and package file model
  • Gate all the changes behind a feature flag. Rollout issue: #342921 (closed)
  • Update all the related specs
    • We created several context to check things with the feature flag disabled. Those contexts will get removed along with the feature flag cleanup

🐘 Database

Putting the ~performance aspect to a side, most of the changes will trigger the same queries. That's mainly because we're replacing an .includes with .preload. This has no impact as .includes would end up calling .preload.

The disable_joins: true option has two impacts:

  • !70624 (comment 703158343)
    • Safe to do, due to the limits we have already in place
  • !70624 (comment 703158348)
    • Adds an additional query that can load an array of pipeline_id.
    • We assume that we can limit that array in a follow up because in a existing similar query we load a bigger array with not ids but a full row from the build_info table.

📷 Screenshots or screen recordings

N/A

💻 How to set up and validate locally

  1. Create a Group (whatever visibility) and a project
  2. Upload a few packages to the project. You can use https://gitlab.com/10io/gl_pru for that.
  3. In the rails console, create a few packages_build_infos records:
    Packages::BuildInfo.create(package: Packages::Package.first, pipeline: Ci::Pipeline.first)
    Packages::BuildInfo.create(package: Packages::Package.first, pipeline: Ci::Pipeline.last)
  4. Check the API calls examples of the analysis and see that they don't trigger a cross join query.

📝 MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by David Fernandez

Merge request reports