Skip to content

Fix bad data in projects.has_external_issue_tracker

What does this MR do?

This MR fixes historical bad data in projects.has_external_issue_tracker #273574 (closed).

It is very similar to the data migration added in !53790 (merged) to correct bad data in projects.has_external_wiki with differences to account for that a project can have multiple associated "external issue tracker" services, whereas only a single "external wiki" service.

Counts of affect projects on GitLab.com can be found in the related issue #273574 (closed).

Note: A PG Trigger added in !49916 (merged) will ensure future data is correct.

Migration output

Up

== 20210210221006 CleanupProjectsWithBadHasExternalIssueTrackerData: migrating
-- execute("WITH project_ids_to_update (id) AS (\n  SELECT DISTINCT \"services\".\"project_id\" FROM \"services\" INNER JOIN \"projects\" ON \"projects\".\"id\" = \"services\".\"project_id\" WHERE \"services\".\"active\" = TRUE AND \"services\".\"category\" = 'issue_tracker' AND \"services\".\"project_id\" IS NOT NULL AND \"services\".\"id\" >= 15 AND \"projects\".\"has_external_issue_tracker\" = FALSE AND \"projects\".\"pending_delete\" = FALSE\n)\nUPDATE projects SET has_external_issue_tracker = true WHERE id IN (SELECT id FROM project_ids_to_update)\n")
   -> 0.0056s
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:projects, :id, {:where=>"(\n  \"projects\".\"has_external_issue_tracker\" = TRUE\n)\nAND \"projects\".\"pending_delete\" = FALSE\n", :name=>"tmp_idx_projects_on_id_where_has_external_issue_tracker_is_true", :algorithm=>:concurrently})
   -> 0.0174s
-- execute("SET statement_timeout TO 0")
   -> 0.0005s
-- add_index(:projects, :id, {:where=>"(\n  \"projects\".\"has_external_issue_tracker\" = TRUE\n)\nAND \"projects\".\"pending_delete\" = FALSE\n", :name=>"tmp_idx_projects_on_id_where_has_external_issue_tracker_is_true", :algorithm=>:concurrently})
   -> 0.0116s
-- execute("RESET ALL")
   -> 0.0007s
-- execute("WITH project_ids_to_update (id) AS (\n  SELECT \"projects\".\"id\" FROM \"projects\" WHERE ((\n  \"projects\".\"has_external_issue_tracker\" = TRUE\n)\nAND \"projects\".\"pending_delete\" = FALSE\n) AND \"projects\".\"id\" >= 25 AND (NOT EXISTS (SELECT 1 FROM \"services\" WHERE (services.project_id = projects.id) AND \"services\".\"category\" = 'issue_tracker' AND \"services\".\"active\" = TRUE))\n)\nUPDATE projects SET has_external_issue_tracker = false WHERE id IN (SELECT id FROM project_ids_to_update)\n")
   -> 0.0014s
-- transaction_open?()
   -> 0.0000s
-- indexes(:projects)
   -> 0.0157s
-- remove_index(:projects, {:algorithm=>:concurrently, :name=>"tmp_idx_projects_on_id_where_has_external_issue_tracker_is_true"})
   -> 0.0043s
== 20210210221006 CleanupProjectsWithBadHasExternalIssueTrackerData: migrated (0.1882s)

Down

Note this is a no-op:

== 20210210221006 CleanupProjectsWithBadHasExternalIssueTrackerData: reverting
== 20210210221006 CleanupProjectsWithBadHasExternalIssueTrackerData: reverted (0.0000s)

SQL queries in CleanupProjectsWithBadHasExternalIssueTrackerData (20210210221006)

Update projects.has_external_issue_tracker to be TRUE.

Note, query is operated on batches of 100 services.

WITH project_ids_to_update (id) AS 
(
   SELECT DISTINCT
      "services"."project_id" 
   FROM
      "services" 
      INNER JOIN
         "projects" 
         ON "projects"."id" = "services"."project_id" 
   WHERE
      "services"."active" = TRUE 
      AND "services"."category" = 'issue_tracker' 
      AND "services"."project_id" IS NOT NULL 
      AND "services"."id" >= 100
      AND "services"."id" < 200 
      AND "projects"."has_external_issue_tracker" = FALSE 
      AND "projects"."pending_delete" = FALSE 
)
UPDATE
   projects 
SET
   has_external_issue_tracker = true 
WHERE
   id IN 
   (
      SELECT
         id 
      FROM
         project_ids_to_update
   )

Explain:

 ModifyTable on public.projects  (cost=7.63..10.66 rows=1 width=819) (actual time=0.050..0.052 rows=0 loops=1)
   Buffers: shared hit=6
   CTE project_ids_to_update
     ->  Unique  (cost=7.04..7.05 rows=1 width=4) (actual time=0.047..0.048 rows=0 loops=1)
           Buffers: shared hit=6
           ->  Sort  (cost=7.04..7.04 rows=1 width=4) (actual time=0.047..0.048 rows=0 loops=1)
                 Sort Key: services.project_id
                 Sort Method: quicksort  Memory: 25kB
                 Buffers: shared hit=6
                 ->  Nested Loop  (cost=0.99..7.03 rows=1 width=4) (actual time=0.009..0.010 rows=0 loops=1)
                       Buffers: shared hit=3
                       ->  Index Scan using services_pkey on public.services  (cost=0.42..3.45 rows=1 width=4) (actual time=0.009..0.009 rows=0 loops=1)
                             Index Cond: ((services.id >= 100) AND (services.id < 200))
                             Filter: (services.active AND (services.project_id IS NOT NULL) AND ((services.category)::text = 'issue_tracker'::text))
                             Rows Removed by Filter: 0
                             Buffers: shared hit=3
                       ->  Index Scan using idx_projects_on_repository_storage_last_repository_updated_at on public.projects projects_1  (cost=0.56..3.58 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
                             Index Cond: (projects_1.id = services.project_id)
                             Filter: ((NOT projects_1.has_external_issue_tracker) AND (NOT projects_1.pending_delete))
                             Rows Removed by Filter: 0
   ->  Nested Loop  (cost=0.58..3.61 rows=1 width=819) (actual time=0.049..0.050 rows=0 loops=1)
         Buffers: shared hit=6
         ->  HashAggregate  (cost=0.02..0.03 rows=1 width=32) (actual time=0.049..0.049 rows=0 loops=1)
               Group Key: project_ids_to_update.id
               Buffers: shared hit=6
               ->  CTE Scan on project_ids_to_update  (cost=0.00..0.02 rows=1 width=32) (actual time=0.048..0.048 rows=0 loops=1)
                     Buffers: shared hit=6
         ->  Index Scan using idx_projects_on_repository_storage_last_repository_updated_at on public.projects  (cost=0.56..3.58 rows=1 width=746) (actual time=0.000..0.000 rows=0 loops=0)
               Index Cond: (projects.id = project_ids_to_update.id)

Summary:

Time: 7.966 ms
  - planning: 7.535 ms
  - execution: 0.431 ms
    - I/O read: N/A
    - I/O write: N/A

Shared buffers:
  - hits: 6 (~48.00 KiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Temporary index (gets removed during migration)

CREATE INDEX tmp_idx_projects_on_id_where_has_external_issue_tracker_is_true 
ON projects USING btree (id) 
WHERE
   ((has_external_issue_tracker = true) AND (pending_delete = false))
;

Execing this on #database-lab resulted in an index creation time of:

Duration: 153.464 s (estimated* for prod: 8.086...153.464 s
% time      seconds wait_event
------ ------------ -----------------------------
94.68    145.299968 IO.DataFileRead
4.12       6.315605 Running
1.15       1.770629 IO.DataFileWrite
0.05       0.077465 IO.SLRURead
------ ------------ -----------------------------
100.00   153.463667

Update projects.has_external_issue_tracker to be FALSE

Note, query is operated on batches of 100 projects and relies on the above index.

WITH project_ids_to_update (id) AS 
(
   SELECT
      "projects"."id" 
   FROM
      "projects" 
   WHERE
      (
( "projects"."has_external_issue_tracker" = TRUE ) 
         AND "projects"."pending_delete" = FALSE 
      )
      AND "projects"."id" >= 100
      AND "projects"."id" < 200 
      AND 
      (
         NOT EXISTS 
         (
            SELECT
               1 
            FROM
               "services" 
            WHERE
               (
                  services.project_id = projects.id
               )
               AND "services"."category" = 'issue_tracker' 
               AND "services"."active" = TRUE
         )
      )
)
UPDATE
   projects 
SET
   has_external_issue_tracker = false 
WHERE
   id IN 
   (
      SELECT
         id 
      FROM
         project_ids_to_update
   )

Execution plan:

 ModifyTable on public.projects  (cost=10.49..13.52 rows=1 width=835) (actual time=0.163..0.165 rows=0 loops=1)
   Buffers: shared read=3
   I/O Timings: read=0.112
   CTE project_ids_to_update
     ->  Nested Loop Anti Join  (cost=0.84..9.91 rows=1 width=4) (actual time=0.159..0.160 rows=0 loops=1)
           Buffers: shared read=3
           I/O Timings: read=0.112
           ->  Index Only Scan using tmp_idx_projects_on_id_where_has_external_issue_tracker_is_true on public.projects projects_1  (cost=0.42..3.44 rows=1 width=4) (actual time=0.158..0.158 rows=0 loops=1)
                 Index Cond: ((projects_1.id >= 100) AND (projects_1.id < 200))
                 Heap Fetches: 0
                 Buffers: shared read=3
                 I/O Timings: read=0.112
           ->  Index Scan using index_services_on_project_id_and_type_unique on public.services  (cost=0.42..3.45 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
                 Index Cond: (services.project_id = projects_1.id)
                 Filter: (services.active AND ((services.category)::text = 'issue_tracker'::text))
                 Rows Removed by Filter: 0
   ->  Nested Loop  (cost=0.58..3.61 rows=1 width=835) (actual time=0.162..0.163 rows=0 loops=1)
         Buffers: shared read=3
         I/O Timings: read=0.112
         ->  HashAggregate  (cost=0.02..0.03 rows=1 width=32) (actual time=0.162..0.162 rows=0 loops=1)
               Group Key: project_ids_to_update.id
               Buffers: shared read=3
               I/O Timings: read=0.112
               ->  CTE Scan on project_ids_to_update  (cost=0.00..0.02 rows=1 width=32) (actual time=0.160..0.160 rows=0 loops=1)
                     Buffers: shared read=3
                     I/O Timings: read=0.112
         ->  Index Scan using idx_projects_on_repository_storage_last_repository_updated_at on public.projects  (cost=0.56..3.58 rows=1 width=762) (actual time=0.000..0.000 rows=0 loops=0)
               Index Cond: (projects.id = project_ids_to_update.id)

Summary:

Time: 7.073 ms
  - planning: 6.479 ms
  - execution: 0.594 ms
    - I/O read: 0.112 ms
    - I/O write: N/A

Shared buffers:
  - hits: 0 from the buffer pool
  - reads: 3 (~24.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Related to #273574 (closed)

Edited by Luke Duncalfe

Merge request reports

Loading