Skip to content

Resolve "Geo: Slow JOIN query with project_repository_states"

What does this MR do?

Use a CTE query to improve the performance of the query to find outdated projects that need to be reverified.

Why was this MR needed?

The current implementation of this query appears to be the slowest query in production now and sometimes hit statement timeout. Event without the index introduced by this MR, the new optimezed query does not timeout on GitLab.com:

Sort  (cost=9088.84..9091.32 rows=991 width=12) (actual time=843.245..843.362 rows=991 loops=1)
  Sort Key: outdated_projects.last_repository_updated_at
  Sort Method: quicksort  Memory: 71kB
  Buffers: shared hit=113602 read=42
  I/O Timings: read=0.668
  CTE outdated_projects
    ->  Limit  (cost=0.42..9019.70 rows=991 width=12) (actual time=690.910..842.460 rows=991 loops=1)
          Buffers: shared hit=113599 read=42
          I/O Timings: read=0.668
          ->  Nested Loop  (cost=0.42..399888.61 rows=43938 width=12) (actual time=690.908..842.306 rows=991 loops=1)
                Buffers: shared hit=113599 read=42
                I/O Timings: read=0.668
                ->  Seq Scan on projects  (cost=0.00..262649.86 rows=283855 width=12) (actual time=0.025..660.129 rows=28286 loops=1)
                      Filter: ((repository_storage)::text = 'nfs-file03'::text)
                      Rows Removed by Filter: 720477
                      Buffers: shared hit=25441
                ->  Index Only Scan using idx_repository_states_outdated_checksums on project_repository_states  (cost=0.42..0.47 rows=1 width=4) (actual time=0.006..0.006 rows=0 loops=28286)
                      Index Cond: (project_id = projects.id)
                      Heap Fetches: 1395
                      Buffers: shared hit=88158 read=42
                      I/O Timings: read=0.668
  ->  CTE Scan on outdated_projects  (cost=0.00..19.82 rows=991 width=12) (actual time=690.913..842.919 rows=991 loops=1)
        Buffers: shared hit=113599 read=42
        I/O Timings: read=0.668
Planning time: 4.626 ms
Execution time: 843.534 ms

Does this MR meet the acceptance criteria?

  • Changelog entry added, if necessary
  • Documentation created/updated
  • API support added
  • Tests added for this feature/bug
  • Conform by the code review guidelines
    • Has been reviewed by a UX Designer
    • Has been reviewed by a Frontend maintainer
    • Has been reviewed by a Backend maintainer
    • Has been reviewed by a Database specialist
  • EE specific content should be in the top level /ee folder
  • Conform by the merge request performance guides
  • Conform by the style guides
  • If you have multiple commits, please combine them into a few logically organized commits by squashing them
  • Internationalization required/considered
  • If paid feature, have we considered GitLab.com plan and how it works for groups and is there a design for promoting it to users who aren't on the correct plan
  • End-to-end tests pass (package-qa manual pipeline job)

What are the relevant issue numbers?

Closes #6053 (closed)

Edited by Douglas Barbosa Alexandre

Merge request reports