Skip to content

Add partial GIN indexes for issues table

Heinrich Lee Yu requested to merge 364556-issues-partial-gin-index into master

What does this MR do and why?

This MR adds partial GIN indexes for the issues table so that we can drop the existing GIN indexes in the future. Right now, these indexes are only used in a very specific scenario: Non-english searches at the global scope. English searches are now using the full-text search indexes.

The goal of this is to fix the GIN pending list problem where we have a lot of pending updates to these GIN indexes causing timeouts when inserting / updating rows.

With these partial indexes, the index size is much smaller and they would only need to be updated when an issue that has a non-english character is updated. These types of issues are much rarer.

gitlabhq_dblab=# SELECT pg_size_pretty(pg_total_relation_size('index_issues_on_title_trigram')), pg_size_pretty(pg_total_relation_size('index_issues_on_title_trigram_non_latin'));
 pg_size_pretty | pg_size_pretty
----------------+----------------
 9460 MB        | 595 MB
(1 row)

gitlabhq_dblab=# SELECT pg_size_pretty(pg_total_relation_size('index_issues_on_description_trigram')), pg_size_pretty(pg_total_relation_size('index_issues_on_description_trigram_non_latin'));
 pg_size_pretty | pg_size_pretty
----------------+----------------
 52 GB          | 2885 MB
(1 row)

Old query:

SELECT "issues".*
FROM "issues"
  INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
WHERE "issues"."author_id" NOT IN (SELECT "banned_users"."user_id" FROM "banned_users")
  AND "issues"."project_id" IN (
    SELECT "projects"."id"
    FROM "projects"
      LEFT JOIN project_features ON projects.id = project_features.project_id WHERE (EXISTS (SELECT 1 FROM "project_authorizations" WHERE "project_authorizations"."user_id" = 2880930 AND (project_authorizations.project_id = projects.id) AND (project_authorizations.access_level >= 10)) OR projects.visibility_level IN (10,20)) AND ("project_features"."issues_access_level" IS NULL OR "project_features"."issues_access_level" IN (20,30) OR ("project_features"."issues_access_level" = 10 AND EXISTS (SELECT 1 FROM "project_authorizations" WHERE "project_authorizations"."user_id" = 2880930 AND (project_authorizations.project_id = project_features.project_id) AND (project_authorizations.access_level >= 10)))))
  AND ("issues"."state_id" IN (1))
  AND "projects"."archived" = FALSE
  AND "issues"."issue_type" IN (0, 1)
  AND ("issues"."title" ILIKE '%中文語%' OR "issues"."description" ILIKE '%中文語%')
  AND "projects"."archived" = FALSE
ORDER BY "issues"."created_at" DESC, "issues"."id" DESC LIMIT 20 OFFSET 0;

https://explain.depesz.com/s/CYQO

New query (extra WHERE clause is the only difference):

SELECT "issues".*
FROM "issues"
  INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
WHERE "issues"."author_id" NOT IN (SELECT "banned_users"."user_id" FROM "banned_users")
  AND "issues"."project_id" IN (
    SELECT "projects"."id"
    FROM "projects"
      LEFT JOIN project_features ON projects.id = project_features.project_id WHERE (EXISTS (SELECT 1 FROM "project_authorizations" WHERE "project_authorizations"."user_id" = 2880930 AND (project_authorizations.project_id = projects.id) AND (project_authorizations.access_level >= 10)) OR projects.visibility_level IN (10,20)) AND ("project_features"."issues_access_level" IS NULL OR "project_features"."issues_access_level" IN (20,30) OR ("project_features"."issues_access_level" = 10 AND EXISTS (SELECT 1 FROM "project_authorizations" WHERE "project_authorizations"."user_id" = 2880930 AND (project_authorizations.project_id = project_features.project_id) AND (project_authorizations.access_level >= 10)))))
  AND ("issues"."state_id" IN (1))
  AND "projects"."archived" = FALSE
  AND "issues"."issue_type" IN (0, 1)
  AND ("issues"."title" ILIKE '%中文語%' OR "issues"."description" ILIKE '%中文語%')
  AND ("issues"."title" NOT SIMILAR TO '[\u0000-\u218F]*' OR "issues"."description" NOT SIMILAR TO '[\u0000-\u218F]*')
  AND "projects"."archived" = FALSE
ORDER BY "issues"."created_at" DESC, "issues"."id" DESC LIMIT 20 OFFSET 0;

https://explain.depesz.com/s/ZCyL

New query is slightly faster due to the smaller index.

Migration output
main: == 20220714144713 AddPartialTrigramIndexForIssueTitle: migrating ==============
main: -- transaction_open?()
main:    -> 0.0000s
main: -- index_exists?(:issues, :title, {:name=>"index_issues_on_title_trigram_non_latin", :using=>:gin, :opclass=>{:description=>:gin_trgm_ops}, :where=>"title NOT SIMILAR TO '[\\u0000-\\u218F]*' OR description NOT SIMILAR TO '[\\u0000-\\u218F]*'", :algorithm=>:concurrently})
main:    -> 0.0112s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0001s
main: -- add_index(:issues, :title, {:name=>"index_issues_on_title_trigram_non_latin", :using=>:gin, :opclass=>{:description=>:gin_trgm_ops}, :where=>"title NOT SIMILAR TO '[\\u0000-\\u218F]*' OR description NOT SIMILAR TO '[\\u0000-\\u218F]*'", :algorithm=>:concurrently})
main:    -> 0.0101s
main: -- execute("RESET statement_timeout")
main:    -> 0.0002s
main: == 20220714144713 AddPartialTrigramIndexForIssueTitle: migrated (0.0290s) =====

ci: == 20220714144713 AddPartialTrigramIndexForIssueTitle: migrating ==============
ci: -- transaction_open?()
ci:    -> 0.0000s
ci: -- index_exists?(:issues, :title, {:name=>"index_issues_on_title_trigram_non_latin", :using=>:gin, :opclass=>{:description=>:gin_trgm_ops}, :where=>"title NOT SIMILAR TO '[\\u0000-\\u218F]*' OR description NOT SIMILAR TO '[\\u0000-\\u218F]*'", :algorithm=>:concurrently})
ci:    -> 0.0105s
ci: -- execute("SET statement_timeout TO 0")
ci:    -> 0.0001s
ci: -- add_index(:issues, :title, {:name=>"index_issues_on_title_trigram_non_latin", :using=>:gin, :opclass=>{:description=>:gin_trgm_ops}, :where=>"title NOT SIMILAR TO '[\\u0000-\\u218F]*' OR description NOT SIMILAR TO '[\\u0000-\\u218F]*'", :algorithm=>:concurrently})
ci:    -> 0.0098s
ci: -- execute("RESET statement_timeout")
ci:    -> 0.0002s
ci: == 20220714144713 AddPartialTrigramIndexForIssueTitle: migrated (0.0242s) =====

main: == 20220714150028 AddPartialTrigramIndexForIssueDescription: migrating ========
main: -- transaction_open?()
main:    -> 0.0000s
main: -- index_exists?(:issues, :description, {:name=>"index_issues_on_description_trigram_non_latin", :using=>:gin, :opclass=>{:description=>:gin_trgm_ops}, :where=>"title NOT SIMILAR TO '[\\u0000-\\u218F]*' OR description NOT SIMILAR TO '[\\u0000-\\u218F]*'", :algorithm=>:concurrently})
main:    -> 0.0113s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0002s
main: -- add_index(:issues, :description, {:name=>"index_issues_on_description_trigram_non_latin", :using=>:gin, :opclass=>{:description=>:gin_trgm_ops}, :where=>"title NOT SIMILAR TO '[\\u0000-\\u218F]*' OR description NOT SIMILAR TO '[\\u0000-\\u218F]*'", :algorithm=>:concurrently})
main:    -> 0.0037s
main: -- execute("RESET statement_timeout")
main:    -> 0.0002s
main: == 20220714150028 AddPartialTrigramIndexForIssueDescription: migrated (0.0204s)

ci: == 20220714150028 AddPartialTrigramIndexForIssueDescription: migrating ========
ci: -- transaction_open?()
ci:    -> 0.0000s
ci: -- index_exists?(:issues, :description, {:name=>"index_issues_on_description_trigram_non_latin", :using=>:gin, :opclass=>{:description=>:gin_trgm_ops}, :where=>"title NOT SIMILAR TO '[\\u0000-\\u218F]*' OR description NOT SIMILAR TO '[\\u0000-\\u218F]*'", :algorithm=>:concurrently})
ci:    -> 0.0091s
ci: -- execute("SET statement_timeout TO 0")
ci:    -> 0.0001s
ci: -- add_index(:issues, :description, {:name=>"index_issues_on_description_trigram_non_latin", :using=>:gin, :opclass=>{:description=>:gin_trgm_ops}, :where=>"title NOT SIMILAR TO '[\\u0000-\\u218F]*' OR description NOT SIMILAR TO '[\\u0000-\\u218F]*'", :algorithm=>:concurrently})
ci:    -> 0.0030s
ci: -- execute("RESET statement_timeout")
ci:    -> 0.0002s
ci: == 20220714150028 AddPartialTrigramIndexForIssueDescription: migrated (0.0139s)


main: == 20220714150028 AddPartialTrigramIndexForIssueDescription: reverting ========
main: -- transaction_open?()
main:    -> 0.0000s
main: -- indexes(:issues)
main:    -> 0.0126s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0002s
main: -- remove_index(:issues, {:algorithm=>:concurrently, :name=>"index_issues_on_description_trigram_non_latin"})
main:    -> 0.0016s
main: -- execute("RESET statement_timeout")
main:    -> 0.0002s
main: == 20220714150028 AddPartialTrigramIndexForIssueDescription: reverted (0.0196s)

The index creation here would be a no-op on GitLab.com because they have already been created using the weekend async creation process. I have confirmed on dblab that the indexes exist:

Index "public.index_issues_on_title_trigram_non_latin"
 Column |  Type   | Key? | Definition 
--------+---------+------+------------
 title  | integer | yes  | title
gin, for table "public.issues", predicate (title::text !~ similar_escape('[\u0000-\u218F]*'::text, NULL::text) OR description !~ similar_escape('[\u0000-\u218F]*'::text, NULL::text))

Index "public.index_issues_on_description_trigram_non_latin"  
   Column    |  Type   | Key? | Definition    
-------------+---------+------+-------------  
 description | integer | yes  | description  
gin, for table "public.issues", predicate (title::text !~ similar_escape('[\u0000-\u218F]*'::text, NULL::text) OR description !~ similar_escape('[\u0000-\u218F]*'::text, NULL::text))  

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #364556 (closed)

Edited by Heinrich Lee Yu

Merge request reports