Skip to content

Geo - Support filtering package files by keyword via GraphQL

What does this MR do and why?

Currently, for self-service replicables, we are unable to do any filtering on the Geo Replicable views.

This MR supports filtering package files registries by keyword (fuzzy_search) via GraphQL:

Registry. Searchable attributes
Package Files file_name

Database

Raw SQL and query plans

Using:

Packages::PackageFile.search('GitLab')

Then:

SELECT
    packages_package_files.id
FROM
    packages_package_files
WHERE
    packages_package_files.file_name ILIKE '%Gitlab%'
LIMIT 1000;

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/14970/commands/52119

Query plan (without index):

 Limit  (cost=1000.00..39545.23 rows=1000 width=8) (actual time=4.728..7204.836 rows=1000 loops=1)
   Buffers: shared read=29683 dirtied=623 written=622
   I/O Timings: read=19971.284 write=21.080
   ->  Gather  (cost=1000.00..6767730.51 rows=175553 width=8) (actual time=4.727..7204.516 rows=1000 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared read=29683 dirtied=623 written=622
         I/O Timings: read=19971.284 write=21.080
         ->  Parallel Seq Scan on public.packages_package_files  (cost=0.00..6749175.21 rows=73147 width=8) (actual time=13.432..7190.436 rows=334 loops=3)
               Filter: ((packages_package_files.file_name)::text ~~* '%Gitlab%'::text)
               Rows Removed by Filter: 353748
               Buffers: shared read=29683 dirtied=623 written=622
               I/O Timings: read=19971.284 write=21.080

Statistics (without index):

Time: 7.209 s  
  - planning: 4.258 ms  
  - execution: 7.205 s  
    - I/O read: 19.971 s  
    - I/O write: 21.080 ms  
  
Shared buffers:  
  - hits: 0 from the buffer pool  
  - reads: 29683 (~231.90 MiB) from the OS file cache, including disk I/O  
  - dirtied: 623 (~4.90 MiB)  
  - writes: 622 (~4.90 MiB) 

Then: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/14970/commands/52124

Query plan (with index):

 Limit  (cost=1476.97..3673.71 rows=1000 width=8) (actual time=91.741..931.075 rows=1000 loops=1)
   Buffers: shared hit=341 read=903 dirtied=18
   I/O Timings: read=828.767 write=0.000
   ->  Bitmap Heap Scan on public.packages_package_files  (cost=1476.97..387387.11 rows=175674 width=8) (actual time=91.738..930.493 rows=1000 loops=1)
         Buffers: shared hit=341 read=903 dirtied=18
         I/O Timings: read=828.767 write=0.000
         ->  Bitmap Index Scan using index_packages_package_files_on_file_name  (cost=0.00..1433.05 rows=175674 width=0) (actual time=69.862..69.863 rows=132421 loops=1)
               Index Cond: ((packages_package_files.file_name)::text ~~* '%Gitlab%'::text)
               Buffers: shared hit=341 read=95
               I/O Timings: read=6.828 write=0.000

Statistics (with index):

Time: 940.499 ms  
  - planning: 8.136 ms  
  - execution: 932.363 ms  
    - I/O read: 828.767 ms  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 341 (~2.70 MiB) from the buffer pool  
  - reads: 903 (~7.10 MiB) from the OS file cache, including disk I/O  
  - dirtied: 18 (~144.00 KiB)  
  - writes: 0  

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #364722 (closed) and #411770

Edited by Javiera Tapia

Merge request reports