Add index on project_uploads(project_id, id)
What does this MR do and why?
File-based project export iterates a project's uploads with
WHERE model_id = X AND model_type = 'Project' ORDER BY id LIMIT N.
No existing index supports that pattern with the ORDER BY, so
PostgreSQL exceeds the statement timeout for projects with millions of uploads.
This changes adds an index on the project_uploads sharding key (project_id) and
id. A follow-up MR switches the export query to filter by
project_id so the planner can serve the query from this index.
References
Database review
Before — without the index
The current production query (WHERE model_id = X AND model_type = 'Project') falls back to project_uploads_pkey and walks the table in id order, filtering out every row that doesn't belong to the project.
Query plan: https://console.postgres.ai/gitlab/gitlab-production-main/sessions/51973/commands/153143
SELECT "uploads".*
FROM "uploads"
WHERE "uploads"."model_id" = 278964
AND "uploads"."model_type" = 'Project'
AND "uploads"."path" != 'avatar/dk.png'
ORDER BY "uploads"."id" ASC
LIMIT 100;Limit (cost=0.57..2156.13 rows=100 width=297) (actual time=19.842..19.883 rows=100 loops=1)
Buffers: shared hit=52266
-> Index Scan using project_uploads_pkey on public.project_uploads uploads (cost=0.57..10588010.86 rows=491194 width=297) (actual time=19.840..19.875 rows=100 loops=1)
Index Cond: (uploads.model_type = 'Project'::text)
Filter: ((uploads.path <> 'avatar/dk.png'::text) AND (uploads.model_id = 278964))
Rows Removed by Filter: 57509
Buffers: shared hit=52266
Settings: seq_page_cost = '4', effective_cache_size = '472585MB', jit = 'off', random_page_cost = '1.5', work_mem = '230MB'Time: 22.015 ms
- planning: 2.069 ms
- execution: 19.946 ms
- I/O read: N/A
- I/O write: N/A
Shared buffers:
- hits: 52266 (~408.30 MiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0After — with index_project_uploads_on_project_id_and_id
This index will started to be used in !237088 (merged)
Query plan: https://console.postgres.ai/gitlab/gitlab-production-main/sessions/51973/commands/153146
SELECT "uploads".*
FROM "uploads"
WHERE "uploads"."project_id" = 278964
AND "uploads"."model_type" = 'Project'
ORDER BY "uploads"."id" ASC
LIMIT 100; Limit (cost=0.57..120.63 rows=100 width=297) (actual time=0.043..0.099 rows=100 loops=1)
Buffers: shared hit=96
-> Index Scan using index_project_uploads_on_project_id_and_id on public.project_uploads uploads (cost=0.57..590129.30 rows=491526 width=297) (actual time=0.042..0.091 rows=100 loops=1)
Index Cond: (uploads.project_id = 278964)
Filter: (uploads.model_type = 'Project'::text)
Rows Removed by Filter: 0
Buffers: shared hit=96
Settings: random_page_cost = '1.5', work_mem = '230MB', seq_page_cost = '4', effective_cache_size = '472585MB', jit = 'off'Time: 2.263 ms
- planning: 2.115 ms
- execution: 0.148 ms
- I/O read: N/A
- I/O write: N/A
Shared buffers:
- hits: 96 (~768.00 KiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0Screenshots or screen recordings
| Before | After |
|---|---|
How to set up and validate locally
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.