rework indexes on redirect_routes
This replaces three indexes that were not arranged properly with two partial indexes that cover the same use cases.
The three indexes it removes are:
- A btree index on
permanent
which is a boolean column and which would never be selective enough to be useful. - An expression index on
lower(path)
which was not usingtext_pattern_ops
and wouldn't be useful for the LIKE clauses that are used on this column - An index that did use
text_pattern_ops
but was onpath
rather thanlower(path)
and so was still not useful...
It replaces them with two indexes which are both partial indexes and adds a non-partial one to enforce the same constraint a MySQL:
(lower(path) varchar_pattern_ops) WHERE permanent
(lower(path) varchar_pattern_ops) WHERE NOT permanent OR permanent IS NULL
- a UNIQUE index on
(lower(path) varchar_pattern_ops
to enforce the uniqueness
For any query that does specify a clause on permanent
it should handle both equality clauses and LIKE clauses (assuming they start with a long enough constant prefix). It can not handle regular collation-based inequalities (<=, <, >, >= etc) but it looks like we don't use these on path.
The old schema looked like:
Indexes:
"redirect_routes_pkey" PRIMARY KEY, btree (id)
"index_redirect_routes_on_path" UNIQUE, btree (path)
"index_on_redirect_routes_lower_path" btree (lower(path::text))
"index_redirect_routes_on_path_text_pattern_ops" btree (path varchar_pattern_ops)
"index_redirect_routes_on_permanent" btree (permanent)
"index_redirect_routes_on_source_type_and_source_id" btree (source_type, source_id)
The new schema looks like:
Indexes:
"redirect_routes_pkey" PRIMARY KEY, btree (id)
"index_redirect_routes_on_path" UNIQUE, btree (path)
"index_redirect_routes_on_path_unique_text_pattern_ops" UNIQUE, btree (lower(path::text) varchar_pattern_ops)
"index_redirect_routes_on_path_text_pattern_ops_where_permanent" btree (lower(path::text) varchar_pattern_ops) WHERE permanent
"index_redirect_routes_on_path_text_pattern_ops_where_temporary" btree (lower(path::text) varchar_pattern_ops) WHERE NOT permanent OR permanent IS NULL
"index_redirect_routes_on_source_type_and_source_id" btree (source_type, source_id)
The two partial indexes are already created in production under the same names.
The migration:
== 20180113220114 ReworkRedirectRoutesIndexes: migrating ======================
-- execute("SET statement_timeout TO 0")
-> 0.0004s
-- index_exists?(:redirect_routes, :permanent)
-> 0.0093s
-- transaction_open?()
-> 0.0000s
-- select_one("SELECT current_setting('server_version_num') AS v")
-> 0.0007s
-- execute("SET statement_timeout TO 0")
-> 0.0003s
-- remove_index(:redirect_routes, {:algorithm=>:concurrently, :column=>:permanent})
-> 0.0062s
-- execute("CREATE UNIQUE INDEX CONCURRENTLY IF NOT EXISTS index_redirect_routes_on_path_unique_text_pattern_ops ON redirect_routes (lower(path) varchar_pattern_ops);")
-> 0.0321s
-- execute("CREATE INDEX CONCURRENTLY IF NOT EXISTS index_redirect_routes_on_path_text_pattern_ops_where_permanent ON redirect_routes (lower(path) varchar_pattern_ops) where (permanent);")
-> 0.0119s
-- execute("CREATE INDEX CONCURRENTLY IF NOT EXISTS index_redirect_routes_on_path_text_pattern_ops_where_temporary ON redirect_routes (lower(path) varchar_pattern_ops) where (not permanent or permanent is null) ;")
-> 0.0373s
-- execute("DROP INDEX CONCURRENTLY IF EXISTS index_redirect_routes_on_path_text_pattern_ops;")
-> 0.0020s
-- execute("DROP INDEX CONCURRENTLY IF EXISTS index_on_redirect_routes_lower_path;")
-> 0.0023s
== 20180113220114 ReworkRedirectRoutesIndexes: migrated (0.1057s) =============
And the migrations rollback:
== 20180113220114 ReworkRedirectRoutesIndexes: reverting ======================
-- execute("SET statement_timeout TO 0")
-> 0.0004s
-- transaction_open?()
-> 0.0000s
-- execute("SET statement_timeout TO 0")
-> 0.0002s
-- add_index(:redirect_routes, :permanent, {:algorithm=>:concurrently})
-> 1.7828s
-- execute("CREATE INDEX CONCURRENTLY index_redirect_routes_on_path_text_pattern_ops ON redirect_routes (path varchar_pattern_ops);")
-> 5.8640s
-- execute("CREATE INDEX CONCURRENTLY index_on_redirect_routes_lower_path ON redirect_routes (LOWER(path));")
-> 0.0431s
-- execute("DROP INDEX CONCURRENTLY IF EXISTS index_redirect_routes_on_path_unique_text_pattern_ops;")
-> 0.0633s
-- execute("DROP INDEX CONCURRENTLY IF EXISTS index_redirect_routes_on_path_text_pattern_ops_where_permanent;")
-> 0.0069s
-- execute("DROP INDEX CONCURRENTLY IF EXISTS index_redirect_routes_on_path_text_pattern_ops_where_temporary;")
-> 0.0057s
== 20180113220114 ReworkRedirectRoutesIndexes: reverted (7.7675s) =============
The plan for the query that was consuming 800-1000ms/s of cpu time in production now looks like:
gitlabhq_production=# explain analyze SELECT 1 AS one FROM redirect_routes WHERE ("permanent" = false or permanent is null) AND (LOWER(path) = LOWER('foo/bar') OR LOWER(path) LIKE LOWER('foo/%')) LIMIT 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=3.86..3.98 rows=1 width=4) (actual time=0.128..0.128 rows=0 loops=1)
-> Bitmap Heap Scan on redirect_routes (cost=3.86..5.38 rows=13 width=4) (actual time=0.128..0.128 rows=0 loops=1)
Recheck Cond: (((lower((path)::text) = 'foo/bar'::text) AND ((NOT permanent) OR (permanent IS NULL))) OR ((lower((path)::text) ~~ 'foo/%'::text) AND ((NOT permanent) OR (permanent IS NULL))))
Filter: (((NOT permanent) OR (permanent IS NULL)) AND ((lower((path)::text) = 'foo/bar'::text) OR (lower((path)::text) ~~ 'foo/%'::text)))
-> BitmapOr (cost=3.86..3.86 rows=1 width=0) (actual time=0.127..0.127 rows=0 loops=1)
-> Bitmap Index Scan on index_redirect_routes_on_path_text_pattern_ops_where_temporary (cost=0.00..1.93 rows=1 width=0) (actual time=0.124..0.124 rows=0 loops=1)
Index Cond: (lower((path)::text) = 'foo/bar'::text)
-> Bitmap Index Scan on index_redirect_routes_on_path_text_pattern_ops_where_temporary (cost=0.00..1.93 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1)
Index Cond: ((lower((path)::text) ~>=~ 'foo/'::text) AND (lower((path)::text) ~<~ 'foo0'::text))
Planning time: 1.579 ms
Execution time: 0.176 ms
(11 rows)
Database Checklist
When adding migrations:
-
Updated db/schema.rb
-
Added a down
method so the migration can be reverted -
Added the output of the migration(s) to the MR body -
Added tests for the migration in spec/migrations
if necessary (e.g. when migrating data)
When adding or modifying queries to improve performance:
-
Included data that shows the performance improvement, preferably in the form of a benchmark -
Included the output of EXPLAIN (ANALYZE, BUFFERS)
of the relevant queries
When adding foreign keys to existing tables:
-
Included a migration to remove orphaned rows in the source table before adding the foreign key -
Removed any instances of dependent: ...
that may no longer be necessary
When adding tables:
-
Ordered columns based on the Ordering Table Columns guidelines -
Added foreign keys to any columns pointing to data in other tables -
Added indexes for fields that are used in statements such as WHERE, ORDER BY, GROUP BY, and JOINs
When removing columns, tables, indexes or other structures:
-
Removed these in a post-deployment migration -
Made sure the application no longer uses (or ignores) these structures
General Checklist
-
Changelog entry added, if necessary -
Documentation created/updated -
API support added -
Tests added for this feature/bug - Review
-
Has been reviewed by Backend -
Has been reviewed by Database
-
-
Conform by the merge request performance guides -
Conform by the style guides -
Squashed related commits together