Fix Security features tables cross-joining `ci_builds` to find project
From !62092 (closed) we have seen examples where security feature related tables (eg. security_scans
) do not have a project_id
(due to full normalization) and thus they need to join through ci_builds
in order to find the project (eg. Security::Scan.has_dismissal_feedback
, Analytics::DevopsAdoption::SnapshotCalculator#security_scan_succeeded
, Security::Finding.undismissed
).
But this will not be possible once we move ci_builds
to a separate database.
Options
- De-normalizing
project_id
. In this case it would be simply adding aproject_id
column to the relevant tables (eg.security_scans
) and back-filling it. This is a reasonable de-normalization becauseproject_id
usually doesn't change so it isn't problematic to keep up to date. This is a very common pattern in GitLab. In the past it has been for performance reasons but now it will become necessary to avoid joins that would have had to cross databases but it will also improve performance. - If it's efficient you can sometimes remove joins and just write a separate queries to the different databases to find the project or join in memory with 2 datasets
- Remove the feature that needs to do these joins
Sharding team recommended approach
Add a project_id
column to security_scans
. This table is on the order of millions of rows. This is big but not huge and as such it should be fairly safe to add this extra column and backfill all rows then make it non-nullable. This will improve performance of all these queries and will solve the core problem and will require very little code changes.
A recent example which introduced a similar de-normalization and background migration can be found at !37713 (merged) and can be used as a template for how to do the same thing here.
Implementation plan:
-
Migration for adding new column -
Migration for adding index and foreign key -
Background migration for backfilling data from ci_builds
tosecurity_scans
-
Make sure we are inserting data for new column on save