Skip to content

Geo - Support filtering LFS objects registries by keyword via GraphQL

What does this MR do and why?

Currently, for self-service replicables, we are unable to do any filtering on the Geo Replicable views.

This MR provides the base of this functionality and supports filtering LFS objects registries by keyword (fuzzy_search) via GraphQL:

Registries Searchable attributes
LFS objects File

Database

Raw SQL and query plans

Using:

Geo::LfsObjectRegistry.with_search('Gitlab')

Then:

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/15178/commands/52739

SELECT
    lfs_objects.*
FROM
    lfs_objects
WHERE
    lfs_objects.file = '8de917525f83104736f6c64d32f0e2a02f5bf2ee57843a54f222cba8c813'
LIMIT 1000;

Query plan (without index):

 Limit  (cost=1000.00..6461605.08 rows=1 width=158) (actual time=300078.069..300082.609 rows=1 loops=1)
   Buffers: shared read=1533885 dirtied=20701 written=20198
   I/O Timings: read=882767.897 write=642.039
   ->  Gather  (cost=1000.00..6461605.08 rows=1 width=158) (actual time=300078.059..300082.597 rows=1 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared read=1533885 dirtied=20701 written=20198
         I/O Timings: read=882767.897 write=642.039
         ->  Parallel Seq Scan on public.lfs_objects  (cost=0.00..6460604.98 rows=1 width=158) (actual time=216713.183..300066.897 rows=0 loops=3)
               Filter: ((lfs_objects.file)::text = '8de917525f83104736f6c64d32f0e2a02f5bf2ee57843a54f222cba8c813'::text)
               Rows Removed by Filter: 20790679
               Buffers: shared read=1533885 dirtied=20701 written=20198
               I/O Timings: read=882767.897 write=642.039

Statistics (without index):

Time: 5.001 min
  - planning: 1.268 ms
  - execution: 5.001 min
    - I/O read: 14.713 min
    - I/O write: 642.039 ms

Shared buffers:
  - hits: 0 from the buffer pool
  - reads: 1533885 (~11.70 GiB) from the OS file cache, including disk I/O
  - dirtied: 20701 (~161.70 MiB)
  - writes: 20198 (~157.80 MiB)

Then:

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/15178/commands/52741

Query plan (with index):

 Limit  (cost=0.69..3.71 rows=1 width=158) (actual time=1.394..1.397 rows=1 loops=1)
   Buffers: shared read=6
   I/O Timings: read=1.347 write=0.000
   ->  Index Scan using index_lfs_objects_on_file on public.lfs_objects  (cost=0.69..3.71 rows=1 width=158) (actual time=1.391..1.394 rows=1 loops=1)
         Index Cond: ((lfs_objects.file)::text = '8de917525f83104736f6c64d32f0e2a02f5bf2ee57843a54f222cba8c813'::text)
         Buffers: shared read=6
         I/O Timings: read=1.347 write=0.000

Statistics (with index):

Time: 2.862 ms  
  - planning: 1.430 ms  
  - execution: 1.432 ms  
    - I/O read: 1.347 ms  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 0 from the buffer pool  
  - reads: 6 (~48.00 KiB) from the OS file cache, including disk I/O  
  - dirtied: 0  
  - writes: 0 

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #364722 (closed) and #411770

Edited by Javiera Tapia

Merge request reports