Fix bad data in projects.has_external_issue_tracker
What does this MR do?
This MR fixes historical bad data in projects.has_external_issue_tracker
#273574 (closed).
It is very similar to the data migration added in !53790 (merged) to correct bad data in projects.has_external_wiki
with differences to account for that a project can have multiple associated "external issue tracker" services, whereas only a single "external wiki" service.
Counts of affect projects on GitLab.com can be found in the related issue #273574 (closed).
Note: A PG Trigger added in !49916 (merged) will ensure future data is correct.
Migration output
Up
== 20210210221006 CleanupProjectsWithBadHasExternalIssueTrackerData: migrating
-- execute("WITH project_ids_to_update (id) AS (\n SELECT DISTINCT \"services\".\"project_id\" FROM \"services\" INNER JOIN \"projects\" ON \"projects\".\"id\" = \"services\".\"project_id\" WHERE \"services\".\"active\" = TRUE AND \"services\".\"category\" = 'issue_tracker' AND \"services\".\"project_id\" IS NOT NULL AND \"services\".\"id\" >= 15 AND \"projects\".\"has_external_issue_tracker\" = FALSE AND \"projects\".\"pending_delete\" = FALSE\n)\nUPDATE projects SET has_external_issue_tracker = true WHERE id IN (SELECT id FROM project_ids_to_update)\n")
-> 0.0056s
-- transaction_open?()
-> 0.0000s
-- index_exists?(:projects, :id, {:where=>"(\n \"projects\".\"has_external_issue_tracker\" = TRUE\n)\nAND \"projects\".\"pending_delete\" = FALSE\n", :name=>"tmp_idx_projects_on_id_where_has_external_issue_tracker_is_true", :algorithm=>:concurrently})
-> 0.0174s
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- add_index(:projects, :id, {:where=>"(\n \"projects\".\"has_external_issue_tracker\" = TRUE\n)\nAND \"projects\".\"pending_delete\" = FALSE\n", :name=>"tmp_idx_projects_on_id_where_has_external_issue_tracker_is_true", :algorithm=>:concurrently})
-> 0.0116s
-- execute("RESET ALL")
-> 0.0007s
-- execute("WITH project_ids_to_update (id) AS (\n SELECT \"projects\".\"id\" FROM \"projects\" WHERE ((\n \"projects\".\"has_external_issue_tracker\" = TRUE\n)\nAND \"projects\".\"pending_delete\" = FALSE\n) AND \"projects\".\"id\" >= 25 AND (NOT EXISTS (SELECT 1 FROM \"services\" WHERE (services.project_id = projects.id) AND \"services\".\"category\" = 'issue_tracker' AND \"services\".\"active\" = TRUE))\n)\nUPDATE projects SET has_external_issue_tracker = false WHERE id IN (SELECT id FROM project_ids_to_update)\n")
-> 0.0014s
-- transaction_open?()
-> 0.0000s
-- indexes(:projects)
-> 0.0157s
-- remove_index(:projects, {:algorithm=>:concurrently, :name=>"tmp_idx_projects_on_id_where_has_external_issue_tracker_is_true"})
-> 0.0043s
== 20210210221006 CleanupProjectsWithBadHasExternalIssueTrackerData: migrated (0.1882s)
Down
Note this is a no-op
:
== 20210210221006 CleanupProjectsWithBadHasExternalIssueTrackerData: reverting
== 20210210221006 CleanupProjectsWithBadHasExternalIssueTrackerData: reverted (0.0000s)
CleanupProjectsWithBadHasExternalIssueTrackerData
(20210210221006
)
SQL queries in
projects.has_external_issue_tracker
to be TRUE
.
Update Note, query is operated on batches of 100 services.
WITH project_ids_to_update (id) AS
(
SELECT DISTINCT
"services"."project_id"
FROM
"services"
INNER JOIN
"projects"
ON "projects"."id" = "services"."project_id"
WHERE
"services"."active" = TRUE
AND "services"."category" = 'issue_tracker'
AND "services"."project_id" IS NOT NULL
AND "services"."id" >= 100
AND "services"."id" < 200
AND "projects"."has_external_issue_tracker" = FALSE
AND "projects"."pending_delete" = FALSE
)
UPDATE
projects
SET
has_external_issue_tracker = true
WHERE
id IN
(
SELECT
id
FROM
project_ids_to_update
)
Explain:
ModifyTable on public.projects (cost=7.63..10.66 rows=1 width=819) (actual time=0.050..0.052 rows=0 loops=1)
Buffers: shared hit=6
CTE project_ids_to_update
-> Unique (cost=7.04..7.05 rows=1 width=4) (actual time=0.047..0.048 rows=0 loops=1)
Buffers: shared hit=6
-> Sort (cost=7.04..7.04 rows=1 width=4) (actual time=0.047..0.048 rows=0 loops=1)
Sort Key: services.project_id
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=6
-> Nested Loop (cost=0.99..7.03 rows=1 width=4) (actual time=0.009..0.010 rows=0 loops=1)
Buffers: shared hit=3
-> Index Scan using services_pkey on public.services (cost=0.42..3.45 rows=1 width=4) (actual time=0.009..0.009 rows=0 loops=1)
Index Cond: ((services.id >= 100) AND (services.id < 200))
Filter: (services.active AND (services.project_id IS NOT NULL) AND ((services.category)::text = 'issue_tracker'::text))
Rows Removed by Filter: 0
Buffers: shared hit=3
-> Index Scan using idx_projects_on_repository_storage_last_repository_updated_at on public.projects projects_1 (cost=0.56..3.58 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
Index Cond: (projects_1.id = services.project_id)
Filter: ((NOT projects_1.has_external_issue_tracker) AND (NOT projects_1.pending_delete))
Rows Removed by Filter: 0
-> Nested Loop (cost=0.58..3.61 rows=1 width=819) (actual time=0.049..0.050 rows=0 loops=1)
Buffers: shared hit=6
-> HashAggregate (cost=0.02..0.03 rows=1 width=32) (actual time=0.049..0.049 rows=0 loops=1)
Group Key: project_ids_to_update.id
Buffers: shared hit=6
-> CTE Scan on project_ids_to_update (cost=0.00..0.02 rows=1 width=32) (actual time=0.048..0.048 rows=0 loops=1)
Buffers: shared hit=6
-> Index Scan using idx_projects_on_repository_storage_last_repository_updated_at on public.projects (cost=0.56..3.58 rows=1 width=746) (actual time=0.000..0.000 rows=0 loops=0)
Index Cond: (projects.id = project_ids_to_update.id)
Summary:
Time: 7.966 ms
- planning: 7.535 ms
- execution: 0.431 ms
- I/O read: N/A
- I/O write: N/A
Shared buffers:
- hits: 6 (~48.00 KiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
Temporary index (gets removed during migration)
CREATE INDEX tmp_idx_projects_on_id_where_has_external_issue_tracker_is_true
ON projects USING btree (id)
WHERE
((has_external_issue_tracker = true) AND (pending_delete = false))
;
Exec
ing this on #database-lab
resulted in an index creation time of:
Duration: 153.464 s (estimated* for prod: 8.086...153.464 s
% time seconds wait_event
------ ------------ -----------------------------
94.68 145.299968 IO.DataFileRead
4.12 6.315605 Running
1.15 1.770629 IO.DataFileWrite
0.05 0.077465 IO.SLRURead
------ ------------ -----------------------------
100.00 153.463667
projects.has_external_issue_tracker
to be FALSE
Update Note, query is operated on batches of 100 projects and relies on the above index.
WITH project_ids_to_update (id) AS
(
SELECT
"projects"."id"
FROM
"projects"
WHERE
(
( "projects"."has_external_issue_tracker" = TRUE )
AND "projects"."pending_delete" = FALSE
)
AND "projects"."id" >= 100
AND "projects"."id" < 200
AND
(
NOT EXISTS
(
SELECT
1
FROM
"services"
WHERE
(
services.project_id = projects.id
)
AND "services"."category" = 'issue_tracker'
AND "services"."active" = TRUE
)
)
)
UPDATE
projects
SET
has_external_issue_tracker = false
WHERE
id IN
(
SELECT
id
FROM
project_ids_to_update
)
Execution plan:
ModifyTable on public.projects (cost=10.49..13.52 rows=1 width=835) (actual time=0.163..0.165 rows=0 loops=1)
Buffers: shared read=3
I/O Timings: read=0.112
CTE project_ids_to_update
-> Nested Loop Anti Join (cost=0.84..9.91 rows=1 width=4) (actual time=0.159..0.160 rows=0 loops=1)
Buffers: shared read=3
I/O Timings: read=0.112
-> Index Only Scan using tmp_idx_projects_on_id_where_has_external_issue_tracker_is_true on public.projects projects_1 (cost=0.42..3.44 rows=1 width=4) (actual time=0.158..0.158 rows=0 loops=1)
Index Cond: ((projects_1.id >= 100) AND (projects_1.id < 200))
Heap Fetches: 0
Buffers: shared read=3
I/O Timings: read=0.112
-> Index Scan using index_services_on_project_id_and_type_unique on public.services (cost=0.42..3.45 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
Index Cond: (services.project_id = projects_1.id)
Filter: (services.active AND ((services.category)::text = 'issue_tracker'::text))
Rows Removed by Filter: 0
-> Nested Loop (cost=0.58..3.61 rows=1 width=835) (actual time=0.162..0.163 rows=0 loops=1)
Buffers: shared read=3
I/O Timings: read=0.112
-> HashAggregate (cost=0.02..0.03 rows=1 width=32) (actual time=0.162..0.162 rows=0 loops=1)
Group Key: project_ids_to_update.id
Buffers: shared read=3
I/O Timings: read=0.112
-> CTE Scan on project_ids_to_update (cost=0.00..0.02 rows=1 width=32) (actual time=0.160..0.160 rows=0 loops=1)
Buffers: shared read=3
I/O Timings: read=0.112
-> Index Scan using idx_projects_on_repository_storage_last_repository_updated_at on public.projects (cost=0.56..3.58 rows=1 width=762) (actual time=0.000..0.000 rows=0 loops=0)
Index Cond: (projects.id = project_ids_to_update.id)
Summary:
Time: 7.073 ms
- planning: 6.479 ms
- execution: 0.594 ms
- I/O read: 0.112 ms
- I/O write: N/A
Shared buffers:
- hits: 0 from the buffer pool
- reads: 3 (~24.00 KiB) from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry - [-] Documentation (if required)
-
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides - [-] Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. - [-] Tested in all supported browsers
- [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Related to #273574 (closed)