Add index on project_uploads(project_id, id)

What does this MR do and why?

File-based project export iterates a project's uploads with WHERE model_id = X AND model_type = 'Project' ORDER BY id LIMIT N. No existing index supports that pattern with the ORDER BY, so PostgreSQL exceeds the statement timeout for projects with millions of uploads.

This changes adds an index on the project_uploads sharding key (project_id) and id. A follow-up MR switches the export query to filter by project_id so the planner can serve the query from this index.

References

#554957 (closed)

Database review

Before — without the index

The current production query (WHERE model_id = X AND model_type = 'Project') falls back to project_uploads_pkey and walks the table in id order, filtering out every row that doesn't belong to the project.

Query plan: https://console.postgres.ai/gitlab/gitlab-production-main/sessions/51973/commands/153143

SELECT "uploads".*
  FROM "uploads"
  WHERE "uploads"."model_id" = 278964
    AND "uploads"."model_type" = 'Project'
    AND "uploads"."path" != 'avatar/dk.png'
  ORDER BY "uploads"."id" ASC
  LIMIT 100;
Limit  (cost=0.57..2156.13 rows=100 width=297) (actual time=19.842..19.883 rows=100 loops=1)
   Buffers: shared hit=52266
   ->  Index Scan using project_uploads_pkey on public.project_uploads uploads  (cost=0.57..10588010.86 rows=491194 width=297) (actual time=19.840..19.875 rows=100 loops=1)
         Index Cond: (uploads.model_type = 'Project'::text)
         Filter: ((uploads.path <> 'avatar/dk.png'::text) AND (uploads.model_id = 278964))
         Rows Removed by Filter: 57509
         Buffers: shared hit=52266
Settings: seq_page_cost = '4', effective_cache_size = '472585MB', jit = 'off', random_page_cost = '1.5', work_mem = '230MB'
Time: 22.015 ms
  - planning: 2.069 ms
  - execution: 19.946 ms
    - I/O read: N/A
    - I/O write: N/A

Shared buffers:
  - hits: 52266 (~408.30 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

After — with index_project_uploads_on_project_id_and_id

This index will started to be used in !237088 (merged)

Query plan: https://console.postgres.ai/gitlab/gitlab-production-main/sessions/51973/commands/153146

SELECT "uploads".*
FROM "uploads"
WHERE "uploads"."project_id" = 278964
  AND "uploads"."model_type" = 'Project'
ORDER BY "uploads"."id" ASC
LIMIT 100;
 Limit  (cost=0.57..120.63 rows=100 width=297) (actual time=0.043..0.099 rows=100 loops=1)
   Buffers: shared hit=96
   ->  Index Scan using index_project_uploads_on_project_id_and_id on public.project_uploads uploads  (cost=0.57..590129.30 rows=491526 width=297) (actual time=0.042..0.091 rows=100 loops=1)
         Index Cond: (uploads.project_id = 278964)
         Filter: (uploads.model_type = 'Project'::text)
         Rows Removed by Filter: 0
         Buffers: shared hit=96
Settings: random_page_cost = '1.5', work_mem = '230MB', seq_page_cost = '4', effective_cache_size = '472585MB', jit = 'off'
Time: 2.263 ms
  - planning: 2.115 ms
  - execution: 0.148 ms
    - I/O read: N/A
    - I/O write: N/A

Shared buffers:
  - hits: 96 (~768.00 KiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Screenshots or screen recordings

Before After

How to set up and validate locally

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Rodrigo Tomonari

Merge request reports

Loading