Remove package/ci table cross joins
👺 Context
Packages can be pushed to the GitLab Packages Registry from a CI job. When that happens, the registry will store a reference to the pipeline object for tracking and UI purposes.
Here are the model classes
flowchart LR
pp[::Packages::Package] -- 1:n --> ppf[::Packages::PackageFile]
pp -- 1:n --> bi[::Packages::BuildInfo]
ppf -- 1:n --> pfbi[::Packages::PackageFileBuildInfo]
bi -- n:1 --> pip[::Ci::Pipeline]
pfbi -- n:1 --> pip
-
build_info
objects are really simply: apackage_id
orpackage_file_id
and apipeline_id
. That's it. - Both package models have a
has_many :pipelines, through: :build_infos
that allows us to access the pipelines directly without programmatically access thebuild_infos
first - A package can have many (unbounded)
build_info
objects - A package file can have many too but in practice, it's always single one because a given package file is uploaded only once whereas a package can accept duplicated uploads and we could create a
build_info
on each upload.
Those has_many :pipelines, through: :build_infos
statements can potentially lead to cross join queries with the pipelines table (ci_pipelines
). Those queries need to be removed as part of the database decomposition.
The package effort in that aspect is tracked in #338861 (closed).
🚀 ~performance expectations
During the analysis of these cross joins queries, it has been discovered that many APIs have ~performance issues ranging from loading unbounded amounts of objects to n+1
queries.
The goal here is to remove the cross joins and follow up on adding limits to these queries where applicable in other issues: #289956 (closed), #342192, and #341950 (closed). For example, it is known that the Rest API GET /api/v4/projects/166/packages
loads an unbounded amount of pipelines for each package found (20 packages returned max).
In other words, the scope of this MR will strictly be limited to: removing cross join queries with ci_tables
.
🔀 Cross joins queries
As stated in the analysis, we have two cases that raise cross join queries with a ci_*
table:
- Rest API
GET /api/v4/projects/166/packages/567/package_files
- This will load the
pipelines
of each package file found using a cross join query.
- This will load the
-
GraphQL
package -> pipelines
- This will load the
pipelines
of the given package using a cross join query.
- This will load the
In addition, we noticed that ::Packages::Package
objects are usually loaded through finders that will use .includes(pipelines: :user)
. Currently, that .includes
ends up in a .preload
which in turn will load each object type (build_info
and pipeline
) in its own query. There is a risk here that the .includes
end up using .eager_load
(see https://scoutapm.com/blog/activerecord-includes-vs-joins-vs-preload-vs-eager_load-when-and-where) and that .eager_load
will be a cross join query.
🔬 What does this MR do and why?
- Replace
.includes(pipelines: :user)
with.preload(pipelines: :user)
in the package and package file model- This way, we guarantee that we will never end up using
.eager_load
- This way, we guarantee that we will never end up using
- Update the related finders / API to use that
.preload
- Use
disable_joins: true
option in the package and package file model - Gate all the changes behind a feature flag. Rollout issue: #342921 (closed)
- Update all the related specs
- We created several context to check things with the feature flag disabled. Those contexts will get removed along with the feature flag cleanup
🐘 Database
Putting the ~performance aspect to a side, most of the changes will trigger the same queries. That's mainly because we're replacing an .includes
with .preload
. This has no impact as .includes
would end up calling .preload
.
The disable_joins: true
option has two impacts:
-
!70624 (comment 703158343)
- Safe to do, due to the limits we have already in place
-
!70624 (comment 703158348)
- Adds an additional query that can load an array of
pipeline_id
. - We assume that we can limit that array in a follow up because in a existing similar query we load a bigger array with not ids but a full row from the
build_info
table.
- Adds an additional query that can load an array of
📷 Screenshots or screen recordings
N/A
💻 How to set up and validate locally
- Create a Group (whatever visibility) and a project
- Upload a few packages to the project. You can use https://gitlab.com/10io/gl_pru for that.
- In the rails console, create a few
packages_build_infos
records:Packages::BuildInfo.create(package: Packages::Package.first, pipeline: Ci::Pipeline.first) Packages::BuildInfo.create(package: Packages::Package.first, pipeline: Ci::Pipeline.last)
- Check the API calls examples of the analysis and see that they don't trigger a cross join query.
📝 MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.