Remove package/ci table cross joins
👺 Context
Packages can be pushed to the GitLab Packages Registry from a CI job. When that happens, the registry will store a reference to the pipeline object for tracking and UI purposes.
Here are the model classes
flowchart LR
pp[::Packages::Package] -- 1:n --> ppf[::Packages::PackageFile]
pp -- 1:n --> bi[::Packages::BuildInfo]
ppf -- 1:n --> pfbi[::Packages::PackageFileBuildInfo]
bi -- n:1 --> pip[::Ci::Pipeline]
pfbi -- n:1 --> pip
-
build_infoobjects are really simply: apackage_idorpackage_file_idand apipeline_id. That's it. - Both package models have a
has_many :pipelines, through: :build_infosthat allows us to access the pipelines directly without programmatically access thebuild_infosfirst - A package can have many (unbounded)
build_infoobjects - A package file can have many too but in practice, it's always single one because a given package file is uploaded only once whereas a package can accept duplicated uploads and we could create a
build_infoon each upload.
Those has_many :pipelines, through: :build_infos statements can potentially lead to cross join queries with the pipelines table (ci_pipelines). Those queries need to be removed as part of the database decomposition.
The package effort in that aspect is tracked in #338861 (closed).
🚀 ~performance expectations
During the analysis of these cross joins queries, it has been discovered that many APIs have ~performance issues ranging from loading unbounded amounts of objects to n+1 queries.
The goal here is to remove the cross joins and follow up on adding limits to these queries where applicable in other issues: #289956 (closed), #342192 (closed), and #341950 (closed). For example, it is known that the Rest API GET /api/v4/projects/166/packages loads an unbounded amount of pipelines for each package found (20 packages returned max).
In other words, the scope of this MR will strictly be limited to: removing cross join queries with ci_tables.
🔀 Cross joins queries
As stated in the analysis, we have two cases that raise cross join queries with a ci_* table:
- Rest API
GET /api/v4/projects/166/packages/567/package_files- This will load the
pipelinesof each package file found using a cross join query.
- This will load the
-
GraphQL
package -> pipelines- This will load the
pipelinesof the given package using a cross join query.
- This will load the
In addition, we noticed that ::Packages::Package objects are usually loaded through finders that will use .includes(pipelines: :user). Currently, that .includes ends up in a .preload which in turn will load each object type (build_info and pipeline) in its own query. There is a risk here that the .includes end up using .eager_load (see https://scoutapm.com/blog/activerecord-includes-vs-joins-vs-preload-vs-eager_load-when-and-where) and that .eager_load will be a cross join query.
🔬 What does this MR do and why?
- Replace
.includes(pipelines: :user)with.preload(pipelines: :user)in the package and package file model- This way, we guarantee that we will never end up using
.eager_load
- This way, we guarantee that we will never end up using
- Update the related finders / API to use that
.preload - Use
disable_joins: trueoption in the package and package file model - Gate all the changes behind a feature flag. Rollout issue: #342921 (closed)
- Update all the related specs
- We created several context to check things with the feature flag disabled. Those contexts will get removed along with the feature flag cleanup
🐘 Database
Putting the ~performance aspect to a side, most of the changes will trigger the same queries. That's mainly because we're replacing an .includes with .preload. This has no impact as .includes would end up calling .preload.
The disable_joins: true option has two impacts:
-
!70624 (comment 703158343)
- Safe to do, due to the limits we have already in place
-
!70624 (comment 703158348)
- Adds an additional query that can load an array of
pipeline_id. - We assume that we can limit that array in a follow up because in a existing similar query we load a bigger array with not ids but a full row from the
build_infotable.
- Adds an additional query that can load an array of
📷 Screenshots or screen recordings
N/A
💻 How to set up and validate locally
- Create a Group (whatever visibility) and a project
- Upload a few packages to the project. You can use https://gitlab.com/10io/gl_pru for that.
- In the rails console, create a few
packages_build_infosrecords:Packages::BuildInfo.create(package: Packages::Package.first, pipeline: Ci::Pipeline.first) Packages::BuildInfo.create(package: Packages::Package.first, pipeline: Ci::Pipeline.last) - Check the API calls examples of the analysis and see that they don't trigger a cross join query.
📝 MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.