Skip to content

Add index to package files on package_id and created_at DESC

🍀 Context

The large SaaS customer experience db timeouts or cancellation when downloading Maven files:

GET /api/:version/projects/:id/packages/maven/*path/:file_name

The log entries with 500 errors: link (internal)

The endpoint itself has ~2.5K violations over specified threshold in the last 7 days: link (internal)

The log entries with the long running SQL queries (> 3 seconds): link (internal)

🔨 What does this MR do and why?

Add a new index to the packages_package_files table for package_id and created_at columns.

📸 Screenshots or screen recordings

None

How to set up and validate locally

None

💾 Database review

Migration up

$ rake db:migration:main
main: == 20230321153035 AddPackageIdCreatedAtDescIndexToPackageFiles: migrating =====
main: -- transaction_open?()
main:    -> 0.0001s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.0677s
main: -- index_exists?(:packages_package_files, "package_id, created_at DESC", {:name=>"index_packages_package_files_on_package_id_and_created_at_desc", :algorithm=>:concurrently})
main:    -> 0.0070s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0006s
main: -- add_index(:packages_package_files, "package_id, created_at DESC", {:name=>"index_packages_package_files_on_package_id_and_created_at_desc", :algorithm=>:concurrently})
main:    -> 0.0062s
main: -- execute("RESET statement_timeout")
main:    -> 0.0004s
main: == 20230321153035 AddPackageIdCreatedAtDescIndexToPackageFiles: migrated (0.1080s)

Migration down

$ rake db:rollback:main
main: == 20230321153035 AddPackageIdCreatedAtDescIndexToPackageFiles: reverting =====
main: -- transaction_open?()
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.1010s
main: -- indexes(:packages_package_files)
main:    -> 0.0052s
main: -- current_schema()
main:    -> 0.0001s
main: == 20230321153035 AddPackageIdCreatedAtDescIndexToPackageFiles: reverted (0.1154s) 

🔬 Database analysis

🔥 Now

Query
EXPLAIN SELECT
    "packages_packages".*
FROM
    "packages_packages"
    INNER JOIN "packages_maven_metadata" ON "packages_maven_metadata"."package_id" = "packages_packages"."id"
    INNER JOIN "packages_package_files" ON "packages_package_files"."package_id" = "packages_packages"."id"
WHERE
    "packages_packages"."project_id" = XXXX
    AND "packages_packages"."status" IN (0, 1)
    AND "packages_maven_metadata"."path" = XXXX
ORDER BY
    packages_package_files.created_at DESC
LIMIT 1
Explain plan
 Limit  (cost=77.30..77.30 rows=1 width=100) (actual time=55294.393..55294.402 rows=1 loops=1)
   Buffers: shared hit=1468 read=31083 dirtied=503
   I/O Timings: read=54277.152 write=0.000
   ->  Sort  (cost=77.30..77.31 rows=6 width=100) (actual time=55294.389..55294.397 rows=1 loops=1)
         Sort Key: packages_package_files.created_at DESC
         Sort Method: top-N heapsort  Memory: 25kB
         Buffers: shared hit=1468 read=31083 dirtied=503
         I/O Timings: read=54277.152 write=0.000
         ->  Nested Loop  (cost=1.56..77.27 rows=6 width=100) (actual time=51.976..55121.105 rows=33352 loops=1)
               Buffers: shared hit=1465 read=31083 dirtied=503
               I/O Timings: read=54277.152 write=0.000
               ->  Nested Loop  (cost=0.99..7.03 rows=1 width=100) (actual time=39.594..39.623 rows=1 loops=1)
                     Buffers: shared hit=1 read=13
                     I/O Timings: read=39.285 write=0.000
                     ->  Index Scan using index_packages_maven_metadata_on_path on public.packages_maven_metadata  (cost=0.56..3.58 rows=1 width=8) (actual time=16.355..20.822 rows=2 loops=1)
                           Index Cond: ((packages_maven_metadata.path)::text = 'com/tmobile/cdp/templates-gradle'::text)
                           Buffers: shared read=6
                           I/O Timings: read=20.688 write=0.000
                     ->  Index Scan using packages_packages_pkey on public.packages_packages  (cost=0.43..3.46 rows=1 width=92) (actual time=9.380..9.381 rows=0 loops=2)
                           Index Cond: (packages_packages.id = packages_maven_metadata.package_id)
                           Filter: ((packages_packages.status = ANY ('{0,1}'::integer[])) AND (packages_packages.project_id = 11160640))
                           Rows Removed by Filter: 0
                           Buffers: shared hit=1 read=7
                           I/O Timings: read=18.597 write=0.000
               ->  Index Scan using index_packages_package_files_on_package_id_id on public.packages_package_files  (cost=0.56..68.38 rows=148 width=16) (actual time=12.372..54979.124 rows=33352 loops=1)
                     Index Cond: (packages_package_files.package_id = packages_packages.id)
                     Buffers: shared hit=1464 read=31070 dirtied=503
                     I/O Timings: read=54237.867 write=0.000

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/16720/commands/57046

🚒 After adding the index

pg.ai setup
reset
exec CREATE INDEX foobar ON packages_package_files USING btree (package_id, created_at DESC)
exec VACUUM ANALYZE packages_package_files
Query
SELECT
    "packages_packages".*
FROM
    "packages_packages"
    INNER JOIN "packages_maven_metadata" ON "packages_maven_metadata"."package_id" = "packages_packages"."id"
    INNER JOIN "packages_package_files" ON "packages_package_files"."package_id" = "packages_packages"."id"
WHERE
    "packages_packages"."project_id" = XXXX
    AND "packages_packages"."status" IN (0, 1)
    AND "packages_maven_metadata"."path" = XXXX
ORDER BY
    packages_package_files.created_at DESC
LIMIT 1
Execution plan
 Limit  (cost=13.17..13.17 rows=1 width=101) (actual time=62.117..62.120 rows=1 loops=1)
   Buffers: shared hit=1916 read=143
   I/O Timings: read=39.432 write=0.000
   ->  Sort  (cost=13.17..13.19 rows=6 width=101) (actual time=62.115..62.117 rows=1 loops=1)
         Sort Key: packages_package_files.created_at DESC
         Sort Method: top-N heapsort  Memory: 25kB
         Buffers: shared hit=1916 read=143
         I/O Timings: read=39.432 write=0.000
         ->  Nested Loop  (cost=1.56..13.14 rows=6 width=101) (actual time=31.408..51.274 rows=33352 loops=1)
               Buffers: shared hit=1913 read=143
               I/O Timings: read=39.432 write=0.000
               ->  Nested Loop  (cost=0.99..7.03 rows=1 width=101) (actual time=31.204..31.208 rows=1 loops=1)
                     Buffers: shared hit=1 read=13
                     I/O Timings: read=30.968 write=0.000
                     ->  Index Scan using index_packages_maven_metadata_on_path on public.packages_maven_metadata  (cost=0.56..3.58 rows=1 width=8) (actual time=9.215..11.194 rows=2 loops=1)
                           Index Cond: ((packages_maven_metadata.path)::text = 'com/tmobile/cdp/templates-gradle'::text)
                           Buffers: shared read=6
                           I/O Timings: read=11.092 write=0.000
                     ->  Index Scan using packages_packages_pkey on public.packages_packages  (cost=0.43..3.46 rows=1 width=93) (actual time=9.996..9.996 rows=0 loops=2)
                           Index Cond: (packages_packages.id = packages_maven_metadata.package_id)
                           Filter: ((packages_packages.status = ANY ('{0,1}'::integer[])) AND (packages_packages.project_id = 11160640))
                           Rows Removed by Filter: 0
                           Buffers: shared hit=1 read=7
                           I/O Timings: read=19.876 write=0.000
               ->  Index Only Scan using foobar on public.packages_package_files  (cost=0.56..4.64 rows=147 width=16) (actual time=0.196..14.235 rows=33352 loops=1)
                     Index Cond: (packages_package_files.package_id = packages_maven_metadata.package_id)
                     Heap Fetches: 0
                     Buffers: shared hit=1912 read=130
                     I/O Timings: read=8.463 write=0.000

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/16772/commands/57160

🔍 Observation

The usage Index-Only Scan answers the query from index only without accessing the packages_package_files table.

Shared buffers reads went down: ~242 MiB -> ~1.1 MiB.

The index size is 1785 MB. Well, the table has ~60M rows.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #396650 (closed)

Edited by Dzmitry (Dima) Meshcharakou

Merge request reports