Skip to content

Geo - Improve the query performance to find unsynced job artifacts

What does this MR do?

Add a missing index to artifact_id in the job_artifact_registry table.

  • Query
Geo::JobArtifactRegistryFinder.new.find_unsynced_job_artifacts(batch_size: 1000)
  • Before
=> EXPLAIN for: SELECT  "gitlab_secondary"."ci_job_artifacts".* FROM "gitlab_secondary"."ci_job_artifacts" LEFT OUTER JOIN job_artifact_registry
                               ON job_artifact_registry.artifact_id = gitlab_secondary.ci_job_artifacts.id WHERE ("gitlab_secondary"."ci_job_artifacts"."file_store" = 1 OR "gitlab_secondary"."ci_job_artifacts"."file_store" IS NULL) AND "job_artifact_registry"."artifact_id" IS NULL AND (1=1) LIMIT 1000
                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Limit  (cost=100.00..17524.61 rows=3 width=116)
   ->  Nested Loop Anti Join  (cost=100.00..17524.61 rows=3 width=116)
         Join Filter: (job_artifact_registry.artifact_id = ci_job_artifacts.id)
         ->  Foreign Scan on ci_job_artifacts  (cost=100.00..117.43 rows=6 width=116)
         ->  Materialize  (cost=0.00..17277.52 rows=1768 width=4)
               ->  Seq Scan on job_artifact_registry  (cost=0.00..17268.68 rows=1768 width=4)
  • After
=> EXPLAIN for: SELECT  "gitlab_secondary"."ci_job_artifacts".* FROM "gitlab_secondary"."ci_job_artifacts" LEFT OUTER JOIN job_artifact_registry
                               ON job_artifact_registry.artifact_id = gitlab_secondary.ci_job_artifacts.id WHERE ("gitlab_secondary"."ci_job_artifacts"."file_store" = 1 OR "gitlab_secondary"."ci_job_artifacts"."file_store" IS NULL) AND "job_artifact_registry"."artifact_id" IS NULL AND (1=1) LIMIT 1000
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=100.28..135.87 rows=3 width=116)
   ->  Nested Loop Anti Join  (cost=100.28..135.87 rows=3 width=116)
         ->  Foreign Scan on ci_job_artifacts  (cost=100.00..117.43 rows=6 width=116)
         ->  Index Only Scan using index_job_artifact_registry_on_artifact_id on job_artifact_registry  (cost=0.28..3.63 rows=1 width=4)
               Index Cond: (artifact_id = ci_job_artifacts.id)

Does this MR meet the acceptance criteria?

  • Changelog entry added, if necessary
  • Documentation created/updated
  • API support added
  • Tests added for this feature/bug
  • Review
    • Has been reviewed by UX
    • Has been reviewed by Frontend
    • Has been reviewed by Backend
    • Has been reviewed by Database
  • EE specific content should be in the top level /ee folder
  • Conform by the merge request performance guides
  • Conform by the style guides
  • Squashed related commits together
  • Internationalization required/considered
  • If paid feature, have we considered GitLab.com plan and how it works for groups and is there a design for promoting it to users who aren't on the correct plan
  • End-to-end tests pass (package-qa manual pipeline job)

What are the relevant issue numbers?

Related to https://gitlab.com/gitlab-org/gitlab-ee/issues/5513

Edited by Nick Thomas

Merge request reports