Skip to content

Reindexing: Lock-based retries for DROP INDEX

Andreas Brandl requested to merge ab/reindexing-drop-retry into master

What does this MR do?

In context of reindexing, we need to drop indexes, too (the old index or when we clean up due to an error). That is, we perform DROP INDEX CONCURRENTLY.

In #329724 (closed), we've seen a situation where DROP INDEX CONCURRENTLY was blocked by auto-vacuum. That in itself is not harmful, but it leads to a long running transaction and an alert for the EOC for GitLab.com.

This change aims to avoid this by wrapping the drop index statement into "lock retries". That is, we tightly control the lock_timeout setting and retry upon timeout.

There is a caveat here: Our implementation only worked inside a transaction scope. That is for good reasons: We use SET lock_timeout and friends. However, DROP INDEX CONCURRENTLY cannot be wrapped into a transaction.

We generally have to assume that this goes through pgbouncer. Without a transaction scope, pgbouncer has the freedom to multiplex statements across different connections - hence SET might end up on a different connection than subsequent queries, which breaks the lock retry methodology.

For reindexing, we connect directly to PG - there is no pgbouncer. Hence, this change extends the "retry mechanic" with a method that does not use transactions, so we can wrap DROP INDEX CONCURRENTLY nicely.

Full query log for a reindexing operation
abrandl:gitlab/ (ab/reindexing-drop-retry) $ LOG_QUERIES_TO_CONSOLE=1 bundle exec rake "gitlab:db:reindex[public.code_owner_approval_required]"                                                                                                                                  [10:57:10]
WARNING: This version of GitLab depends on gitlab-shell 13.18.0, but you're running 13.17.0. Please update gitlab-shell.
D, [2021-05-12T10:57:21.619923 #858207] DEBUG -- :   Gitlab::Database::PostgresIndex Load (7.1ms)  SELECT "postgres_indexes".* FROM "postgres_indexes" WHERE "postgres_indexes"."unique" = FALSE AND "postgres_indexes"."partitioned" = FALSE AND "postgres_indexes"."exclusion" = FALSE AND (NOT expression) AND (name !~ '^tmp_reindex_') AND (name !~ '^old_reindex_') AND "postgres_indexes"."identifier" = 'public.code_owner_approval_required' AND (NOT EXISTS (SELECT "postgres_reindex_actions".* FROM "postgres_reindex_actions" WHERE "postgres_reindex_actions"."state" = 1 AND (action_end > '2021-05-05 08:57:21.612268') AND (index_identifier = identifier))) /*application:web,line:/lib/gitlab/database/reindexing/index_selection.rb:27:in `sort_by'*/
D, [2021-05-12T10:57:21.620202 #858207] DEBUG -- :   ↳ lib/gitlab/database/reindexing/index_selection.rb:27:in `sort_by'
D, [2021-05-12T10:57:21.647175 #858207] DEBUG -- :   Gitlab::Database::PostgresIndexBloatEstimate Load (15.2ms)  SELECT "postgres_index_bloat_estimates".* FROM "postgres_index_bloat_estimates" WHERE "postgres_index_bloat_estimates"."identifier" = 'public.code_owner_approval_required' LIMIT 1 /*application:web,line:/lib/gitlab/database/postgres_index.rb:34:in `block in bloat_size'*/
D, [2021-05-12T10:57:21.647473 #858207] DEBUG -- :   ↳ lib/gitlab/database/postgres_index.rb:34:in `block in bloat_size'
D, [2021-05-12T10:57:21.656947 #858207] DEBUG -- :    (0.1ms)  BEGIN /*application:web,line:/lib/gitlab/database.rb:381:in `block in transaction'*/
D, [2021-05-12T10:57:21.657451 #858207] DEBUG -- :   ↳ lib/gitlab/database.rb:381:in `block in transaction'
D, [2021-05-12T10:57:21.657792 #858207] DEBUG -- :   Gitlab::Database::Reindexing::ReindexAction Create (0.3ms)  INSERT INTO "postgres_reindex_actions" ("action_start", "ondisk_size_bytes_start", "index_identifier", "bloat_estimate_bytes_start") VALUES ('2021-05-12 08:57:21.650406', 8192, 'public.code_owner_approval_required', 0) RETURNING "id" /*application:web,line:/lib/gitlab/database.rb:381:in `block in transaction'*/
D, [2021-05-12T10:57:21.658183 #858207] DEBUG -- :   ↳ lib/gitlab/database.rb:381:in `block in transaction'
D, [2021-05-12T10:57:21.669921 #858207] DEBUG -- :    (11.4ms)  COMMIT /*application:web,line:/lib/gitlab/database.rb:381:in `block in transaction'*/
D, [2021-05-12T10:57:21.670315 #858207] DEBUG -- :   ↳ lib/gitlab/database.rb:381:in `block in transaction'
D, [2021-05-12T10:57:21.672550 #858207] DEBUG -- :   Gitlab::Database::PostgresIndex Load (1.1ms)  SELECT "postgres_indexes".* FROM "postgres_indexes" WHERE "postgres_indexes"."schema" = 'public' AND "postgres_indexes"."name" = 'tmp_reindex_1711426' LIMIT 1 /*application:web,line:/lib/gitlab/database/reindexing/concurrent_reindex.rb:46:in `with_rebuilt_index'*/
D, [2021-05-12T10:57:21.672812 #858207] DEBUG -- :   ↳ lib/gitlab/database/reindexing/concurrent_reindex.rb:46:in `with_rebuilt_index'
D, [2021-05-12T10:57:21.673653 #858207] DEBUG -- :    (0.1ms)  SET statement_timeout TO '32400s' /*application:web,line:/lib/gitlab/database/reindexing/concurrent_reindex.rb:147:in `execute'*/
D, [2021-05-12T10:57:21.673838 #858207] DEBUG -- :   ↳ lib/gitlab/database/reindexing/concurrent_reindex.rb:147:in `execute'
D, [2021-05-12T10:57:21.679688 #858207] DEBUG -- :    (5.7ms)  CREATE INDEX CONCURRENTLY tmp_reindex_1711426 ON public.protected_branches USING btree (project_id, code_owner_approval_required) WHERE (code_owner_approval_required = true) /*application:web,line:/lib/gitlab/database/reindexing/concurrent_reindex.rb:58:in `block in with_rebuilt_index'*/
D, [2021-05-12T10:57:21.679880 #858207] DEBUG -- :   ↳ lib/gitlab/database/reindexing/concurrent_reindex.rb:58:in `block in with_rebuilt_index'
D, [2021-05-12T10:57:21.680162 #858207] DEBUG -- :    (0.1ms)  RESET statement_timeout /*application:web,line:/lib/gitlab/database/reindexing/concurrent_reindex.rb:147:in `execute'*/
D, [2021-05-12T10:57:21.680344 #858207] DEBUG -- :   ↳ lib/gitlab/database/reindexing/concurrent_reindex.rb:147:in `execute'
D, [2021-05-12T10:57:21.682280 #858207] DEBUG -- :   Gitlab::Database::PostgresIndex Load (1.5ms)  SELECT "postgres_indexes".* FROM "postgres_indexes" WHERE "postgres_indexes"."schema" = 'public' AND "postgres_indexes"."name" = 'tmp_reindex_1711426' LIMIT 1 /*application:web,line:/lib/gitlab/database/reindexing/concurrent_reindex.rb:61:in `with_rebuilt_index'*/
D, [2021-05-12T10:57:21.682746 #858207] DEBUG -- :   ↳ lib/gitlab/database/reindexing/concurrent_reindex.rb:61:in `with_rebuilt_index'
D, [2021-05-12T10:57:21.684546 #858207] DEBUG -- :    (0.2ms)  BEGIN /*application:web,line:/lib/gitlab/database/with_lock_retries.rb:168:in `execute'*/
D, [2021-05-12T10:57:21.684838 #858207] DEBUG -- :   ↳ lib/gitlab/database/with_lock_retries.rb:168:in `execute'
D, [2021-05-12T10:57:21.685057 #858207] DEBUG -- :    (0.1ms)  SET LOCAL lock_timeout TO '100ms' /*application:web,line:/lib/gitlab/database/with_lock_retries.rb:168:in `execute'*/
D, [2021-05-12T10:57:21.685247 #858207] DEBUG -- :   ↳ lib/gitlab/database/with_lock_retries.rb:168:in `execute'
D, [2021-05-12T10:57:21.685943 #858207] DEBUG -- :    (0.2ms)  ALTER INDEX "public"."code_owner_approval_required"
RENAME TO "old_reindex_1711426"
 /*application:web,line:/lib/gitlab/database/reindexing/concurrent_reindex.rb:96:in `rename_index'*/
D, [2021-05-12T10:57:21.686159 #858207] DEBUG -- :   ↳ lib/gitlab/database/reindexing/concurrent_reindex.rb:96:in `rename_index'
D, [2021-05-12T10:57:21.686492 #858207] DEBUG -- :    (0.1ms)  ALTER INDEX "public"."tmp_reindex_1711426"
RENAME TO "code_owner_approval_required"
 /*application:web,line:/lib/gitlab/database/reindexing/concurrent_reindex.rb:96:in `rename_index'*/
D, [2021-05-12T10:57:21.686682 #858207] DEBUG -- :   ↳ lib/gitlab/database/reindexing/concurrent_reindex.rb:96:in `rename_index'
D, [2021-05-12T10:57:21.687049 #858207] DEBUG -- :    (0.2ms)  ALTER INDEX "public"."old_reindex_1711426"
RENAME TO "tmp_reindex_1711426"
 /*application:web,line:/lib/gitlab/database/reindexing/concurrent_reindex.rb:96:in `rename_index'*/
D, [2021-05-12T10:57:21.687238 #858207] DEBUG -- :   ↳ lib/gitlab/database/reindexing/concurrent_reindex.rb:96:in `rename_index'
D, [2021-05-12T10:57:21.690026 #858207] DEBUG -- :    (2.4ms)  COMMIT /*application:web,line:/lib/gitlab/database.rb:381:in `block in transaction'*/
D, [2021-05-12T10:57:21.690325 #858207] DEBUG -- :   ↳ lib/gitlab/database.rb:381:in `block in transaction'
D, [2021-05-12T10:57:21.690582 #858207] DEBUG -- :    (0.1ms)  RESET idle_in_transaction_session_timeout; RESET lock_timeout /*application:web,line:/lib/gitlab/database/with_lock_retries.rb:168:in `execute'*/
D, [2021-05-12T10:57:21.690754 #858207] DEBUG -- :   ↳ lib/gitlab/database/with_lock_retries.rb:168:in `execute'
D, [2021-05-12T10:57:21.691824 #858207] DEBUG -- :    (0.1ms)  SET lock_timeout TO '60000ms' /*application:web,line:/lib/gitlab/database/with_lock_retries.rb:168:in `execute'*/
D, [2021-05-12T10:57:21.692000 #858207] DEBUG -- :   ↳ lib/gitlab/database/with_lock_retries.rb:168:in `execute'
D, [2021-05-12T10:57:21.695122 #858207] DEBUG -- :    (2.7ms)  DROP INDEX CONCURRENTLY
IF EXISTS "public"."tmp_reindex_1711426"
 /*application:web,line:/lib/gitlab/database/reindexing/concurrent_reindex.rb:112:in `block in remove_index'*/
D, [2021-05-12T10:57:21.695299 #858207] DEBUG -- :   ↳ lib/gitlab/database/reindexing/concurrent_reindex.rb:112:in `block in remove_index'
D, [2021-05-12T10:57:21.695821 #858207] DEBUG -- :    (0.1ms)  RESET idle_in_transaction_session_timeout; RESET lock_timeout /*application:web,line:/lib/gitlab/database/with_lock_retries.rb:168:in `execute'*/
D, [2021-05-12T10:57:21.695998 #858207] DEBUG -- :   ↳ lib/gitlab/database/with_lock_retries.rb:168:in `execute'
D, [2021-05-12T10:57:21.698951 #858207] DEBUG -- :   Gitlab::Database::PostgresIndex Load (1.4ms)  SELECT "postgres_indexes".* FROM "postgres_indexes" WHERE "postgres_indexes"."identifier" = 'public.code_owner_approval_required' LIMIT 1 /*application:web,line:/lib/gitlab/database/reindexing/reindex_action.rb:27:in `finish'*/
D, [2021-05-12T10:57:21.699219 #858207] DEBUG -- :   ↳ lib/gitlab/database/reindexing/reindex_action.rb:27:in `finish'
D, [2021-05-12T10:57:21.701045 #858207] DEBUG -- :   Gitlab::Database::PostgresIndex Load (1.3ms)  SELECT "postgres_indexes".* FROM "postgres_indexes" WHERE "postgres_indexes"."identifier" = 'public.code_owner_approval_required' LIMIT 1 /*application:web,line:/lib/gitlab/database/reindexing/reindex_action.rb:27:in `finish'*/
D, [2021-05-12T10:57:21.701402 #858207] DEBUG -- :   ↳ lib/gitlab/database/reindexing/reindex_action.rb:27:in `finish'
D, [2021-05-12T10:57:21.702444 #858207] DEBUG -- :    (0.1ms)  BEGIN /*application:web,line:/lib/gitlab/database.rb:381:in `block in transaction'*/
D, [2021-05-12T10:57:21.702924 #858207] DEBUG -- :   ↳ lib/gitlab/database.rb:381:in `block in transaction'
D, [2021-05-12T10:57:21.703261 #858207] DEBUG -- :   Gitlab::Database::Reindexing::ReindexAction Update (0.3ms)  UPDATE "postgres_reindex_actions" SET "action_end" = '2021-05-12 08:57:21.701599', "ondisk_size_bytes_end" = 8192, "state" = 1 WHERE "postgres_reindex_actions"."id" = 6 /*application:web,line:/lib/gitlab/database.rb:381:in `block in transaction'*/
D, [2021-05-12T10:57:21.703651 #858207] DEBUG -- :   ↳ lib/gitlab/database.rb:381:in `block in transaction'
D, [2021-05-12T10:57:21.706113 #858207] DEBUG -- :    (2.2ms)  COMMIT /*application:web,line:/lib/gitlab/database.rb:381:in `block in transaction'*/
D, [2021-05-12T10:57:21.706410 #858207] DEBUG -- :   ↳ lib/gitlab/database.rb:381:in `block in transaction'

Edited by Andreas Brandl

Merge request reports