Skip to content

Update Catalog resource search logic to use Postgres Full Text Search

Leaminn Ma requested to merge ci-catalog-full-text-search into master

What does this MR do and why?

This updates the search logic in Ci::Catalog::Resource to use PostgreSQL Full Text Search capability instead of ILIKE (fuzzy_search) and trigram indexes. We are doing this for performance reasons and to reduce the number of indices required to be used with visibility_level in #429056 (closed). See #430889 (closed) for details.

This MR:

  • Adds a migration for a new search_vector TSVector column that is automatically generated with data from name and description.
  • Removes the trigram indices on name and description as they are no longer needed.
  • Updates the search scope on Ci::Catalog::Resource to use the new Full Text Search functionality.

NOTE:

Resolves #430889 (closed).

How to set up and validate locally

We can validate that the GraphQl ciCatalogResources search functionality still works the same way as before.

  1. Under the same parent group, create 2 projects with README files named Orange Project and Apple Project. (We want to set up a situation where only one or both of these projects are returned by the search filter.)
  2. Update the description of Orange Project to Foo; update the description of Apple Project to Orange.
  3. In the console, mark both projects as catalog resources.
Ci::Catalog::Resource.create!(project: Project.find_by(name: 'Orange Project'))
Ci::Catalog::Resource.create!(project: Project.find_by(name: 'Apple Project'))
  1. Go to http://gdk.test:3000/ and test the following query (update the group name as necessary):
query {
  ciCatalogResources(projectPath: "group-a/orange-project", search: "orange") {
    nodes {
      id
      name
    }
  }
}

Observe that both projects from Step 1 are in the response.

Screenshot_2023-10-23_at_12.36.46_PM

  1. If you change the search term to "apple", then only Apple Project will show.

Screenshot_2023-10-23_at_12.37.26_PM

Query plan

Up

main: == [advisory_lock_connection] object_id: 184040, pg_backend_pid: 62227
main: == 20231107193342 AddSearchVectorToCatalogResources: migrating ================
main: -- execute("ALTER TABLE catalog_resources\n  ADD COLUMN search_vector tsvector\n    GENERATED ALWAYS AS\n      (setweight(to_tsvector('english', COALESCE(name, '')), 'A') ||\n       setweight(to_tsvector('english', COALESCE(description, '')), 'B')) STORED;\n")
main:    -> 0.0097s
main: == 20231107193342 AddSearchVectorToCatalogResources: migrated (0.0140s) =======

main: == [advisory_lock_connection] object_id: 184040, pg_backend_pid: 62227
ci: == [advisory_lock_connection] object_id: 184240, pg_backend_pid: 62229
ci: == 20231107193342 AddSearchVectorToCatalogResources: migrating ================
ci: -- execute("ALTER TABLE catalog_resources\n  ADD COLUMN search_vector tsvector\n    GENERATED ALWAYS AS\n      (setweight(to_tsvector('english', COALESCE(name, '')), 'A') ||\n       setweight(to_tsvector('english', COALESCE(description, '')), 'B')) STORED;\n")
ci:    -> 0.0082s
ci: == 20231107193342 AddSearchVectorToCatalogResources: migrated (0.0184s) =======

ci: == [advisory_lock_connection] object_id: 184240, pg_backend_pid: 62229
main: == [advisory_lock_connection] object_id: 184400, pg_backend_pid: 62232
main: == 20231115172623 RemoveNameDescriptionTrigramIndexesFromCatalogResources: migrating 
main: -- transaction_open?(nil)
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.0955s
main: -- indexes(:catalog_resources)
main:    -> 0.0035s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0002s
main: -- remove_index(:catalog_resources, {:algorithm=>:concurrently, :name=>"index_catalog_resources_on_name_trigram"})
main:    -> 0.0014s
main: -- execute("RESET statement_timeout")
main:    -> 0.0004s
main: -- transaction_open?(nil)
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.0007s
main: -- indexes(:catalog_resources)
main:    -> 0.0020s
main: -- remove_index(:catalog_resources, {:algorithm=>:concurrently, :name=>"index_catalog_resources_on_description_trigram"})
main:    -> 0.0012s
main: == 20231115172623 RemoveNameDescriptionTrigramIndexesFromCatalogResources: migrated (0.1218s) 

main: == [advisory_lock_connection] object_id: 184400, pg_backend_pid: 62232
ci: == [advisory_lock_connection] object_id: 184520, pg_backend_pid: 62234
ci: == 20231115172623 RemoveNameDescriptionTrigramIndexesFromCatalogResources: migrating 
ci: -- transaction_open?(nil)
ci:    -> 0.0000s
ci: -- view_exists?(:postgres_partitions)
ci:    -> 0.0007s
ci: -- indexes(:catalog_resources)
ci:    -> 0.0030s
ci: -- execute("SET statement_timeout TO 0")
ci:    -> 0.0001s
ci: -- remove_index(:catalog_resources, {:algorithm=>:concurrently, :name=>"index_catalog_resources_on_name_trigram"})
ci:    -> 0.0008s
ci: -- execute("RESET statement_timeout")
ci:    -> 0.0001s
ci: -- transaction_open?(nil)
ci:    -> 0.0000s
ci: -- view_exists?(:postgres_partitions)
ci:    -> 0.0004s
ci: -- indexes(:catalog_resources)
ci:    -> 0.0024s
ci: -- remove_index(:catalog_resources, {:algorithm=>:concurrently, :name=>"index_catalog_resources_on_description_trigram"})
ci:    -> 0.0019s
ci: == 20231115172623 RemoveNameDescriptionTrigramIndexesFromCatalogResources: migrated (0.0397s) 

ci: == [advisory_lock_connection] object_id: 184520, pg_backend_pid: 62234

Down

main: == [advisory_lock_connection] object_id: 183220, pg_backend_pid: 67116
main: == 20231115172623 RemoveNameDescriptionTrigramIndexesFromCatalogResources: reverting 
main: -- transaction_open?(nil)
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.0775s
main: -- index_exists?(:catalog_resources, :name, {:name=>"index_catalog_resources_on_name_trigram", :using=>:gin, :opclass=>{:name=>:gin_trgm_ops}, :algorithm=>:concurrently})
main:    -> 0.0023s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0002s
main: -- add_index(:catalog_resources, :name, {:name=>"index_catalog_resources_on_name_trigram", :using=>:gin, :opclass=>{:name=>:gin_trgm_ops}, :algorithm=>:concurrently})
main:    -> 0.0026s
main: -- execute("RESET statement_timeout")
main:    -> 0.0001s
main: -- transaction_open?(nil)
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.0003s
main: -- index_exists?(:catalog_resources, :description, {:name=>"index_catalog_resources_on_description_trigram", :using=>:gin, :opclass=>{:description=>:gin_trgm_ops}, :algorithm=>:concurrently})
main:    -> 0.0015s
main: -- add_index(:catalog_resources, :description, {:name=>"index_catalog_resources_on_description_trigram", :using=>:gin, :opclass=>{:description=>:gin_trgm_ops}, :algorithm=>:concurrently})
main:    -> 0.0007s
main: == 20231115172623 RemoveNameDescriptionTrigramIndexesFromCatalogResources: reverted (0.0987s) 

main: == [advisory_lock_connection] object_id: 183220, pg_backend_pid: 67116
ci: == [advisory_lock_connection] object_id: 184460, pg_backend_pid: 67536
ci: == 20231115172623 RemoveNameDescriptionTrigramIndexesFromCatalogResources: reverting 
ci: -- transaction_open?(nil)
ci:    -> 0.0000s
ci: -- view_exists?(:postgres_partitions)
ci:    -> 0.0995s
ci: -- index_exists?(:catalog_resources, :name, {:name=>"index_catalog_resources_on_name_trigram", :using=>:gin, :opclass=>{:name=>:gin_trgm_ops}, :algorithm=>:concurrently})
ci:    -> 0.0027s
ci: -- execute("SET statement_timeout TO 0")
ci:    -> 0.0001s
ci: -- add_index(:catalog_resources, :name, {:name=>"index_catalog_resources_on_name_trigram", :using=>:gin, :opclass=>{:name=>:gin_trgm_ops}, :algorithm=>:concurrently})
ci:    -> 0.0029s
ci: -- execute("RESET statement_timeout")
ci:    -> 0.0002s
ci: -- transaction_open?(nil)
ci:    -> 0.0000s
ci: -- view_exists?(:postgres_partitions)
ci:    -> 0.0004s
ci: -- index_exists?(:catalog_resources, :description, {:name=>"index_catalog_resources_on_description_trigram", :using=>:gin, :opclass=>{:description=>:gin_trgm_ops}, :algorithm=>:concurrently})
ci:    -> 0.0017s
ci: -- add_index(:catalog_resources, :description, {:name=>"index_catalog_resources_on_description_trigram", :using=>:gin, :opclass=>{:description=>:gin_trgm_ops}, :algorithm=>:concurrently})
ci:    -> 0.0009s
ci: == 20231115172623 RemoveNameDescriptionTrigramIndexesFromCatalogResources: reverted (0.1294s) 

ci: == [advisory_lock_connection] object_id: 184460, pg_backend_pid: 67536
main: == [advisory_lock_connection] object_id: 184880, pg_backend_pid: 67975
main: == 20231107193342 AddSearchVectorToCatalogResources: reverting ================
main: -- remove_column(:catalog_resources, :search_vector)
main:    -> 0.0015s
main: == 20231107193342 AddSearchVectorToCatalogResources: reverted (0.0050s) =======

main: == [advisory_lock_connection] object_id: 184880, pg_backend_pid: 67975
ci: == [advisory_lock_connection] object_id: 184180, pg_backend_pid: 68406
ci: == 20231107193342 AddSearchVectorToCatalogResources: reverting ================
ci: -- remove_column(:catalog_resources, :search_vector)
ci:    -> 0.0015s
ci: == 20231107193342 AddSearchVectorToCatalogResources: reverted (0.0098s) =======

ci: == [advisory_lock_connection] object_id: 184180, pg_backend_pid: 68406

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #430889 (closed)

Edited by Leaminn Ma

Merge request reports