Skip to content

Optimize issues query for large collections

Vasilii Iakliushin requested to merge optimize_issues_query into master

What does this MR do and why?

WHERE ... IN (... query is not performant for large collections. A faster alternative is a INNER JOIN over a VALUES clause.

A similar fix was implemented here: !71947 (closed)

Before After
Time: 149.472 ms Time: 1.070 ms

Database

Before

EXPLAIN SELECT
    "issues".*
FROM
    "issues"
WHERE
    "issues"."project_id" = 278964
    AND "issues"."iid" IN (1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 11000, 12000, 13000, 14000, 15000, 16000, 17000, 18000, 19000, 20000)
 Index Scan using index_issues_on_project_id_and_iid on public.issues  (cost=0.57..55.61 rows=11 width=1334) (actual time=11.009..142.526 rows=20 loops=1)
   Index Cond: ((issues.project_id = 278964) AND (issues.iid = ANY ('{1000,2000,3000,4000,5000,6000,7000,8000,9000,10000,11000,12000,13000,14000,15000,16000,17000,18000,19000,20000}'::integer[])))
   Buffers: shared hit=60 read=43
   I/O Timings: read=141.744 write=0.000

https://explain-depesz.postgres.ai/s/Qm

After

EXPLAIN SELECT
    "issues".*
FROM
    "issues"
    INNER JOIN (
        VALUES 
           (1000),(2000),(3000),(4000),(5000),(6000),(7000),(8000),(9000),(10000),(11000),(12000),(13000),(14000),(15000),(16000),(17000),(18000),(19000),(20000)) issue_ids (id) ON (issue_ids.id = issues.iid)
WHERE
    "issues"."project_id" = 278964
Nested Loop  (cost=0.57..72.00 rows=8 width=1334) (actual time=0.031..0.203 rows=20 loops=1)
   Buffers: shared hit=100
   I/O Timings: read=0.000 write=0.000
   ->  Values Scan on "*VALUES*"  (cost=0.00..0.25 rows=20 width=4) (actual time=0.002..0.008 rows=20 loops=1)
         I/O Timings: read=0.000 write=0.000
   ->  Index Scan using index_issues_on_project_id_and_iid on public.issues  (cost=0.57..3.59 rows=1 width=1334) (actual time=0.009..0.009 rows=1 loops=20)
         Index Cond: ((issues.project_id = 278964) AND (issues.iid = "*VALUES*".column1))
         Buffers: shared hit=100
         I/O Timings: read=0.000 write=0.000

https://explain-depesz.postgres.ai/s/Ih

How to set up and validate locally

You need a repository with at least 2 issues.

  1. Create markdown file test.md with following content
First reference #1
Second reference #2
  1. Open test.md file and wait till it loads
  2. Verify that issue links work

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Merge request reports