Resolve "Geo - Revisit the indexes on the project_repository_states table"
requested to merge 6014-geo-fix-index-for-outdated-projects-on-the-project-repository-states into master
What does this MR do?
Improve the query returned by Geo::RepositoryVerificationFinder#find_outdated_projects to find projects where the checksum needs to be recalculated. Follow the query plans were run locally against a database with ~400k rows in the projects, and project_repository_states tables.
EXPLAIN ANALYZE
SELECT
"projects"."id"
FROM
"projects"
INNER JOIN "project_repository_states" ON "project_repository_states"."project_id" = "projects"."id"
WHERE ("project_repository_states"."repository_verification_checksum" IS NULL
AND "project_repository_states"."last_repository_verification_failure" IS NULL
OR "project_repository_states"."wiki_verification_checksum" IS NULL
AND "project_repository_states"."last_wiki_verification_failure" IS NULL)
AND "projects"."repository_storage" = 'default'
LIMIT 1000;
- Before:
Limit (cost=3.58..302.13 rows=1000 width=4) (actual time=330.079..335.280 rows=1000 loops=1)
-> Merge Join (cost=3.58..85720.56 rows=287110 width=4) (actual time=330.078..335.220 rows=1000 loops=1)
Merge Cond: (projects.id = project_repository_states.project_id)
-> Index Scan using projects_pkey on projects (cost=0.42..34092.75 rows=400011 width=4) (actual time=0.006..94.979 rows=213383 loops=1)
Filter: ((repository_storage)::text = 'default'::text)
-> Index Scan using index_project_repository_states_on_project_id on project_repository_states (cost=0.42..48412.66 rows=287110 width=4) (actual time=224.091..226.711 rows=1000 loops=1)
Filter: (((repository_verification_checksum IS NULL) AND (last_repository_verification_failure IS NULL)) OR ((wiki_verification_checksum IS NULL) AND (last_wiki_verification_failure IS NULL)))
Rows Removed by Filter: 212383
Planning time: 0.485 ms
Execution time: 335.348 ms
- After:
Limit (cost=1.40..167.31 rows=1000 width=4) (actual time=135.072..136.107 rows=1000 loops=1)
-> Merge Join (cost=1.40..47635.61 rows=287110 width=4) (actual time=135.072..136.024 rows=1000 loops=1)
Merge Cond: (projects.id = project_repository_states.project_id)
-> Index Scan using projects_pkey on projects (cost=0.42..34092.75 rows=400011 width=4) (actual time=0.005..120.190 rows=213383 loops=1)
Filter: ((repository_storage)::text = 'default'::text)
-> Index Only Scan using idx_repository_states_outdated_checksums on project_repository_states (cost=0.42..9232.16 rows=287110 width=4) (actual time=0.031..0.182 rows=1000 loops=1)
Heap Fetches: 298
Planning time: 0.470 ms
Execution time: 136.184 ms
Does this MR meet the acceptance criteria?
-
Changelog entry added, if necessary - Conform by the code review guidelines
-
Has been reviewed by a Backend maintainer -
Has been reviewed by a Database specialist
-
-
EE specific content should be in the top level /ee
folder -
Conform by the merge request performance guides -
Conform by the style guides -
If you have multiple commits, please combine them into a few logically organized commits by squashing them -
Internationalization required/considered -
If paid feature, have we considered GitLab.com plan and how it works for groups and is there a design for promoting it to users who aren't on the correct plan -
End-to-end tests pass ( package-qa
manual pipeline job)
What are the relevant issue numbers?
Closes #6014 (closed)
Edited by Douglas Barbosa Alexandre