Skip to content

Migrate the remediation data into their own records

Zamir Martins requested to merge migrate_remediations into master

What does this MR do and why?

Migrate the remediation data into their own records from the json attribute

Related issue: https://gitlab.com/gitlab-org/gitlab/-/issues/239183

DB migrate

$ bundle exec rails db:migrate
main: == 20230118144623 ScheduleMigrationForRemediation: migrating ==================
main: == 20230118144623 ScheduleMigrationForRemediation: migrated (0.0484s) =========

ci: == 20230118144623 ScheduleMigrationForRemediation: migrating ==================
ci: -- The migration is skipped since it modifies the schemas: [:gitlab_main].
ci: -- This database can only apply migrations in one of the following schemas: [:gitlab_ci, :gitlab_shared, :gitlab_internal].
ci: == 20230118144623 ScheduleMigrationForRemediation: migrated (0.0036s) =========
$ bundle exec rails db:rollback:main
main: == 20230118144623 ScheduleMigrationForRemediation: reverting ==================
main: == 20230118144623 ScheduleMigrationForRemediation: reverted (0.0317s) =========

Query plan

Loading batch/subbatch:
select * from vulnerability_occurrences where id between 0 and 400;.
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/15126/commands/52537.
Time(worst case): 77,255 ms

Fetching existing checksum:
select checksum, id from vulnerability_remediations where project_id = 15080436 and checksum IN ('\x0002fa41d57a269f9706a432d1f91b56637900a3a8aa78806f062ceaa606e4af', '\x00278bb77c6e8455ed6e515f2b6f9d7a0578779a0b7e497862fff7b07bbb88f4', '\x012f281d61aa0edf5f147a768bd15b0dfe0172057b97554b1739ca8447a29c26', '\x018358199d2aa3a29039e74c0b4a2a347d2bc653b278b9ed0edeb41b403c31a2', '\x015861ea25890c67e165018b96249be7269d916365d52730e141de3b70410352', '\x01a4bb50805c7b2878b2f576e7ceb4e63764ea87791c1c43f9ca0699618566e6');.
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/15126/commands/52530.
Time(worst case): 4,441 ms

Creating remediation record:
insert into vulnerability_remediations (project_id, checksum, created_at, updated_at, summary, file) values (16411240, '\x0002fa41d57a269f9706a432d1f91b56637900a3a8aa78806f062ceaa606e4af', now(), now(), 'test', 'file' );.
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/15126/commands/52531.
Time(worst case): 42,096 ms

Creating finding/remediation relationship record:
insert into vulnerability_findings_remediations (vulnerability_occurrence_id, vulnerability_remediation_id, created_at, updated_at) values (1,48472,now(),now());.
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/15126/commands/52532.
Time(worst case): 103,341 ms

With a subbatch size length set to 400 the total time per subbatch would be 59951,2 ms.

vulnerability_occurrences currently has 66264176 records. With subbatch set to 400 and delay set to 2 minutes, it would require the following amount of days for the migration to complete:

66264176 records / 400 => 165660,44 times x 2 minutes => 331320,88 min / 1440 => 230,083944444444444 days

edit (after the last commit): with the changes performed in the latest commit, these are the estimate of completion:

66264176 records / 4000 => 16566,044 times x 2 minutes => 33132,088 min / 1440 => 23,008394444444444 days

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Zamir Martins

Merge request reports