Skip to content

Use PostgreSQL FDW for Geo downloads

Stan Hu requested to merge sh-rails-fdw-support into master

Part of gitlab-org/gitlab-ee#3382

With these custom queries using FDW, we see relatively fast queries on the Geo testbed (at least relative to what was there before):

LFS objects

gitlabhq_geo_production=# explain analyze SELECT  "gitlab_secondary"."lfs_objects"."id" FROM "gitlab_secondary"."lfs_objects" LEFT OUTER JOIN file_registry ON file_registry.file_id = gitlab_secondary.lfs_objects.id AND (gitlab_secondary.lfs_objects.file_store IS NULL OR gitlab_secondary.lfs_objects.file_store = 1) AND file_registry.file_type = 'lfs' WHERE (file_registry.file_id IS NULL) ORDER BY "gitlab_secondary"."lfs_objects"."created_at" DESC LIMIT 1000;
                                                                                   QUERY PLAN                                                                                    
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=100.42..2738.22 rows=1000 width=12) (actual time=1412.209..1470.219 rows=1000 loops=1)
   ->  Nested Loop Anti Join  (cost=100.42..5476.26 rows=2038 width=12) (actual time=1412.209..1470.143 rows=1000 loops=1)
         Join Filter: ((lfs_objects.file_store IS NULL) OR (lfs_objects.file_store = 1))
         ->  Foreign Scan on lfs_objects  (cost=100.00..177.54 rows=2048 width=16) (actual time=1411.078..1435.640 rows=14782 loops=1)
         ->  Index Only Scan using index_file_registry_on_file_type_and_file_id on file_registry  (cost=0.42..2.57 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=14782)
               Index Cond: ((file_type = 'lfs'::text) AND (file_id = lfs_objects.id))
               Heap Fetches: 7298
 Planning time: 0.176 ms
 Execution time: 1484.609 ms
(9 rows)

Uploads

gitlabhq_geo_production=# explain analyze SELECT  "gitlab_secondary"."uploads"."id", "gitlab_secondary"."uploads"."uploader" FROM "gitlab_secondary"."uploads" LEFT OUTER JOIN file_registry ON file_registry.file_id = gitlab_secondary.uploads.id AND file_registry.file_type IN ('attachment','avatar','file','personal_file') WHERE (file_registry.file_id IS NULL)  ORDER BY "gitlab_secondary"."uploads"."created_at" DESC LIMIT 1000;
                                                                QUERY PLAN                                                                 
-------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=5685.64..5687.15 rows=602 width=44) (actual time=1485.559..1485.786 rows=1000 loops=1)
   ->  Sort  (cost=5685.64..5687.15 rows=602 width=44) (actual time=1485.557..1485.702 rows=1000 loops=1)
         Sort Key: uploads.created_at DESC
         Sort Method: top-N heapsort  Memory: 127kB
         ->  Hash Anti Join  (cost=5594.35..5657.85 rows=602 width=44) (actual time=48.452..1399.048 rows=543554 loops=1)
               Hash Cond: (uploads.id = file_registry.file_id)
               ->  Foreign Scan on uploads  (cost=100.00..146.12 rows=1204 width=44) (actual time=0.404..1164.297 rows=633528 loops=1)
               ->  Hash  (cost=4365.64..4365.64 rows=90297 width=4) (actual time=47.434..47.434 rows=89974 loops=1)
                     Buckets: 131072  Batches: 1  Memory Usage: 4188kB
                     ->  Seq Scan on file_registry  (cost=0.00..4365.64 rows=90297 width=4) (actual time=0.018..33.584 rows=89974 loops=1)
                           Filter: ((file_type)::text = ANY ('{attachment,avatar,file,personal_file}'::text[]))
                           Rows Removed by Filter: 80714
 Planning time: 0.184 ms
 Execution time: 1486.416 ms
(14 rows)
Edited by Stan Hu

Merge request reports