Skip to content

rework indexes on redirect_routes

Gregory Stark requested to merge fix-redirect-routes-schema into master

This replaces three indexes that were not arranged properly with two partial indexes that cover the same use cases.

The three indexes it removes are:

  1. A btree index on permanent which is a boolean column and which would never be selective enough to be useful.
  2. An expression index on lower(path) which was not using text_pattern_ops and wouldn't be useful for the LIKE clauses that are used on this column
  3. An index that did use text_pattern_ops but was on path rather than lower(path) and so was still not useful...

It replaces them with two indexes which are both partial indexes and adds a non-partial one to enforce the same constraint a MySQL:

  1. (lower(path) varchar_pattern_ops) WHERE permanent
  2. (lower(path) varchar_pattern_ops) WHERE NOT permanent OR permanent IS NULL
  3. a UNIQUE index on (lower(path) varchar_pattern_ops to enforce the uniqueness

For any query that does specify a clause on permanent it should handle both equality clauses and LIKE clauses (assuming they start with a long enough constant prefix). It can not handle regular collation-based inequalities (<=, <, >, >= etc) but it looks like we don't use these on path.

The old schema looked like:

Indexes:
    "redirect_routes_pkey" PRIMARY KEY, btree (id)
    "index_redirect_routes_on_path" UNIQUE, btree (path)
    "index_on_redirect_routes_lower_path" btree (lower(path::text))
    "index_redirect_routes_on_path_text_pattern_ops" btree (path varchar_pattern_ops)
    "index_redirect_routes_on_permanent" btree (permanent)
    "index_redirect_routes_on_source_type_and_source_id" btree (source_type, source_id)

The new schema looks like:

Indexes:
    "redirect_routes_pkey" PRIMARY KEY, btree (id)
    "index_redirect_routes_on_path" UNIQUE, btree (path)
    "index_redirect_routes_on_path_unique_text_pattern_ops" UNIQUE, btree (lower(path::text) varchar_pattern_ops)
    "index_redirect_routes_on_path_text_pattern_ops_where_permanent" btree (lower(path::text) varchar_pattern_ops) WHERE permanent
    "index_redirect_routes_on_path_text_pattern_ops_where_temporary" btree (lower(path::text) varchar_pattern_ops) WHERE NOT permanent OR permanent IS NULL
    "index_redirect_routes_on_source_type_and_source_id" btree (source_type, source_id)

The two partial indexes are already created in production under the same names.

The migration:

== 20180113220114 ReworkRedirectRoutesIndexes: migrating ======================
-- execute("SET statement_timeout TO 0")
   -> 0.0004s
-- index_exists?(:redirect_routes, :permanent)
   -> 0.0093s
-- transaction_open?()
   -> 0.0000s
-- select_one("SELECT current_setting('server_version_num') AS v")
   -> 0.0007s
-- execute("SET statement_timeout TO 0")
   -> 0.0003s
-- remove_index(:redirect_routes, {:algorithm=>:concurrently, :column=>:permanent})
   -> 0.0062s
-- execute("CREATE UNIQUE INDEX CONCURRENTLY IF NOT EXISTS index_redirect_routes_on_path_unique_text_pattern_ops ON redirect_routes (lower(path) varchar_pattern_ops);")
   -> 0.0321s
-- execute("CREATE INDEX CONCURRENTLY IF NOT EXISTS index_redirect_routes_on_path_text_pattern_ops_where_permanent ON redirect_routes (lower(path) varchar_pattern_ops) where (permanent);")
   -> 0.0119s
-- execute("CREATE INDEX CONCURRENTLY IF NOT EXISTS index_redirect_routes_on_path_text_pattern_ops_where_temporary ON redirect_routes (lower(path) varchar_pattern_ops) where (not permanent or permanent is null) ;")
   -> 0.0373s
-- execute("DROP INDEX CONCURRENTLY IF EXISTS index_redirect_routes_on_path_text_pattern_ops;")
   -> 0.0020s
-- execute("DROP INDEX CONCURRENTLY IF EXISTS index_on_redirect_routes_lower_path;")
   -> 0.0023s
== 20180113220114 ReworkRedirectRoutesIndexes: migrated (0.1057s) =============

And the migrations rollback:

== 20180113220114 ReworkRedirectRoutesIndexes: reverting ======================
-- execute("SET statement_timeout TO 0")
   -> 0.0004s
-- transaction_open?()
   -> 0.0000s
-- execute("SET statement_timeout TO 0")
   -> 0.0002s
-- add_index(:redirect_routes, :permanent, {:algorithm=>:concurrently})
   -> 1.7828s
-- execute("CREATE INDEX CONCURRENTLY index_redirect_routes_on_path_text_pattern_ops ON redirect_routes (path varchar_pattern_ops);")
   -> 5.8640s
-- execute("CREATE INDEX CONCURRENTLY index_on_redirect_routes_lower_path ON redirect_routes (LOWER(path));")
   -> 0.0431s
-- execute("DROP INDEX CONCURRENTLY IF EXISTS index_redirect_routes_on_path_unique_text_pattern_ops;")
   -> 0.0633s
-- execute("DROP INDEX CONCURRENTLY IF EXISTS index_redirect_routes_on_path_text_pattern_ops_where_permanent;")
   -> 0.0069s
-- execute("DROP INDEX CONCURRENTLY IF EXISTS index_redirect_routes_on_path_text_pattern_ops_where_temporary;")
   -> 0.0057s
== 20180113220114 ReworkRedirectRoutesIndexes: reverted (7.7675s) =============

The plan for the query that was consuming 800-1000ms/s of cpu time in production now looks like:

gitlabhq_production=# explain analyze SELECT  1 AS one FROM redirect_routes WHERE ("permanent" = false or permanent is null) AND (LOWER(path) = LOWER('foo/bar') OR LOWER(path) LIKE LOWER('foo/%')) LIMIT 1;
                                                                                               QUERY PLAN                                                                                                
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3.86..3.98 rows=1 width=4) (actual time=0.128..0.128 rows=0 loops=1)
   ->  Bitmap Heap Scan on redirect_routes  (cost=3.86..5.38 rows=13 width=4) (actual time=0.128..0.128 rows=0 loops=1)
         Recheck Cond: (((lower((path)::text) = 'foo/bar'::text) AND ((NOT permanent) OR (permanent IS NULL))) OR ((lower((path)::text) ~~ 'foo/%'::text) AND ((NOT permanent) OR (permanent IS NULL))))
         Filter: (((NOT permanent) OR (permanent IS NULL)) AND ((lower((path)::text) = 'foo/bar'::text) OR (lower((path)::text) ~~ 'foo/%'::text)))
         ->  BitmapOr  (cost=3.86..3.86 rows=1 width=0) (actual time=0.127..0.127 rows=0 loops=1)
               ->  Bitmap Index Scan on index_redirect_routes_on_path_text_pattern_ops_where_temporary  (cost=0.00..1.93 rows=1 width=0) (actual time=0.124..0.124 rows=0 loops=1)
                     Index Cond: (lower((path)::text) = 'foo/bar'::text)
               ->  Bitmap Index Scan on index_redirect_routes_on_path_text_pattern_ops_where_temporary  (cost=0.00..1.93 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1)
                     Index Cond: ((lower((path)::text) ~>=~ 'foo/'::text) AND (lower((path)::text) ~<~ 'foo0'::text))
 Planning time: 1.579 ms
 Execution time: 0.176 ms
(11 rows)

Database Checklist

When adding migrations:

  • Updated db/schema.rb
  • Added a down method so the migration can be reverted
  • Added the output of the migration(s) to the MR body
  • Added tests for the migration in spec/migrations if necessary (e.g. when migrating data)

When adding or modifying queries to improve performance:

  • Included data that shows the performance improvement, preferably in the form of a benchmark
  • Included the output of EXPLAIN (ANALYZE, BUFFERS) of the relevant queries

When adding foreign keys to existing tables:

  • Included a migration to remove orphaned rows in the source table before adding the foreign key
  • Removed any instances of dependent: ... that may no longer be necessary

When adding tables:

  • Ordered columns based on the Ordering Table Columns guidelines
  • Added foreign keys to any columns pointing to data in other tables
  • Added indexes for fields that are used in statements such as WHERE, ORDER BY, GROUP BY, and JOINs

When removing columns, tables, indexes or other structures:

  • Removed these in a post-deployment migration
  • Made sure the application no longer uses (or ignores) these structures

General Checklist

Edited by Yorick Peterse

Merge request reports