Skip to content

Add a cache column for the number of changed files in an MR diff

Nick Thomas requested to merge 227570-track-merge-request-diff-files-count into master

What does this MR do?

Preparatory work for modifying the external storage migration to work at large scale.

We need to know if a merge request diff has any merge request diff files when performing the external storage migration. Right now, we do this with a subselect (SELECT id from merge_request_diffs WHERE (select * from merge_request_dif_files where merge_request_diffs.id = merge_request_diff_files.id) or so). This turns out to be expensive - it would be much better to have an indexable column to work against.

This MR adds such a column along with a backfill migration to fill it in for the existing 72M merge request diff rows. A future MR will modify the partial index and change the scheduling queries to make use of it.

Update query

UPDATE "merge_request_diffs" SET   files_count = (
  SELECT count(*)
  FROM merge_request_diff_files
  WHERE merge_request_diff_files.merge_request_diff_id = merge_request_diffs.id
)
 WHERE "merge_request_diffs"."id" BETWEEN 1 AND 21262 AND "merge_request_diffs"."id" >= 18768 AND "merge_request_diffs"."id" < 20010

Note: uncached execution (DB lab)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Closes #227570 (closed)

Edited by Nick Thomas

Merge request reports

Loading