Skip to content

Add migration to fix nil location value in vulnerability_occurrences

Sashi Kumar Kumaresan requested to merge sk/342025-migrate-nil-location into master

What does this MR do and why?

Addresses #342025 (closed)

While verifying #338669 (closed), we found that there are many rows in vulnerability_occurrences table with nil value in location column. This makes filtering the vulnerabilities by image impossible.

!72224 (merged) introduced the fix to populate location value. This MR adds background migration to populate location from raw_metadata['location'].

Migrate Up

$ bundle exec rake db:migrate:up VERSION=20211102103127
== 20211102103127 AddTempIndexToVulnerabilityOccurrences: migrating ===========
-- transaction_open?()
   -> 0.0001s
-- index_exists?(:vulnerability_occurrences, [:id, :location], {:where=>"location IS NULL", :name=>"vulnerability_occurrences_location_temp_index", :algorithm=>:concurrently})
   -> 0.0077s
-- execute("SET statement_timeout TO 0")
   -> 0.0007s
-- add_index(:vulnerability_occurrences, [:id, :location], {:where=>"location IS NULL", :name=>"vulnerability_occurrences_location_temp_index", :algorithm=>:concurrently})
   -> 0.1342s
-- execute("RESET statement_timeout")
   -> 0.0007s
== 20211102103127 AddTempIndexToVulnerabilityOccurrences: migrated (0.1532s) ==

$ bundle exec rake db:migrate:up VERSION=20211021114401  
== 20211021114401 UpdateVulnerabilityOccurrencesLocation: migrating ===========
-- Scheduled 1 UpdateVulnerabilityOccurrencesLocation jobs with a maximum of 1000 records per batch and an interval of 120 seconds.

The migration is expected to take at least 120 seconds. Expect all jobs to have completed after 2021-10-29 06:38:55 UTC."
== 20211021114401 UpdateVulnerabilityOccurrencesLocation: migrated (0.1457s) ==

Migrate Down

$ bundle exec rake db:migrate:down VERSION=20211102103127
== 20211102103127 AddTempIndexToVulnerabilityOccurrences: reverting ===========
-- transaction_open?()
   -> 0.0000s
-- indexes(:vulnerability_occurrences)
   -> 0.0061s
-- execute("SET statement_timeout TO 0")
   -> 0.0006s
-- remove_index(:vulnerability_occurrences, {:algorithm=>:concurrently, :name=>"vulnerability_occurrences_location_temp_index"})
   -> 0.0045s
-- execute("RESET statement_timeout")
   -> 0.0008s
== 20211102103127 AddTempIndexToVulnerabilityOccurrences: reverted (0.0214s) ==

$ bundle exec rake db:migrate:down VERSION=20211021114401
== 20211021114401 UpdateVulnerabilityOccurrencesLocation: reverting ===========
== 20211021114401 UpdateVulnerabilityOccurrencesLocation: reverted (0.0000s) ==

Queries

From migration:

SELECT "vulnerability_occurrences"."id" FROM "vulnerability_occurrences" WHERE "vulnerability_occurrences"."location" IS NULL AND "vulnerability_occurrences"."id" >= 22295 ORDER BY "vulnerability_occurrences"."id" ASC LIMIT 1 OFFSET 1000 

SELECT MIN("vulnerability_occurrences"."id"), MAX("vulnerability_occurrences"."id") FROM "vulnerability_occurrences" WHERE "vulnerability_occurrences"."location" IS NULL AND "vulnerability_occurrences"."id" >= 22295 

From Background migration:


WITH cte(cte_id, cte_location) AS MATERIALIZED (VALUES 
  (1, '{"file":"maven/src/main/java//App.java","start_line":0,"end_line":0,"class":"com.gitlab..App","method":"insecureCypher"}'::jsonb),
  (2, '{"file":"maven/src/main/java//App.java","start_line":1,"end_line":1,"class":"com.gitlab..App","method":"insecureCypher"}'::jsonb),
  (3, '{"file":"maven/src/main/java//App.java","start_line":2,"end_line":2,"class":"com.gitlab..App","method":"insecureCypher"}'::jsonb),
  (4, '{"file":"maven/src/main/java//App.java","start_line":3,"end_line":3,"class":"com.gitlab..App","method":"insecureCypher"}'::jsonb),
  (5, '{"file":"maven/src/main/java//App.java","start_line":4,"end_line":4,"class":"com.gitlab..App","method":"insecureCypher"}'::jsonb),
  (6, '{"file":"maven/src/main/java//App.java","start_line":5,"end_line":5,"class":"com.gitlab..App","method":"insecureCypher"}'::jsonb),
  (7, '{"file":"maven/src/main/java//App.java","start_line":6,"end_line":6,"class":"com.gitlab..App","method":"insecureCypher"}'::jsonb),
  (8, '{"file":"maven/src/main/java//App.java","start_line":7,"end_line":7,"class":"com.gitlab..App","method":"insecureCypher"}'::jsonb),
  (9, '{"file":"maven/src/main/java//App.java","start_line":8,"end_line":8,"class":"com.gitlab..App","method":"insecureCypher"}'::jsonb),
  (10, '{"file":"maven/src/main/java//App.java","start_line":9,"end_line":9,"class":"com.gitlab..App","method":"insecureCypher"}'::jsonb))
UPDATE vulnerability_occurrences
SET
  location = cte_location
FROM
  cte
WHERE
  cte_id = id

Total Duration

  • Batch size: 20,000
  • Delay: 3.minutes
  • No. of sub-batches: 500
  • Total time for each sub-batch: 841.542 ms + 100 ms sleep ~= 950 ms (as mentioned in !72788 (comment 721699679))
  • Total no. of sub-batches per batch: 20000/500 = 40
  • Total time per batch: 40 * 950 = 38000ms (38 seconds)
  • Total no. of rows with location nil: 12500000
  • Total no. of batches with delay of 3 mins: 12500000/20000 = 625
  • Total time for 625 batches: 625 * 3 = 1875 mins

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Sashi Kumar Kumaresan

Merge request reports