Skip to content

Update gitlab:db:reindex to work with multiple databases

Krasimir Angelov requested to merge 340832-reindex-for-multiple-databases into master

What does this MR do and why?

This MR updates the gitlab:db:reindex to work with multiple databases.

To do this we

  • Iterate over all configured databases / connections
  • Convert used models to inherit from SharedModel
  • Inject and use the correct connection where needed

How to set up and validate locally

Single database

While watching the logs

tail -f log/application_json.log 

run

bundle exec rails gitlab:db:reindex

You should see it working only on the main database, i.e. just one "message":"Switched database connection","connection_name":"main" log entry.

On local database this will usually not pick any index to reindex. If you want to see reindexing in action make the following change in lib/gitlab/database/reindexing/index_selection.rb:

diff --git a/lib/gitlab/database/reindexing/index_selection.rb b/lib/gitlab/database/reindexing/index_selection.rb
index 2186384e7d7..71e650deb94 100644
--- a/lib/gitlab/database/reindexing/index_selection.rb
+++ b/lib/gitlab/database/reindexing/index_selection.rb
@@ -10,7 +10,7 @@ class IndexSelection
         MINIMUM_RELATIVE_BLOAT = 0.2
 
         # Only consider indexes with a total ondisk size in this range (before reindexing)
-        INDEX_SIZE_RANGE = (1.gigabyte..100.gigabyte).freeze
+        INDEX_SIZE_RANGE = (1.byte..100.gigabyte).freeze
 
         delegate :each, to: :indexes

With this change logs from the task should look like

{"severity":"DEBUG","time":"2021-10-22T01:28:24.456Z","correlation_id":null,"message":"Switched database connection","connection_name":"main"}
{"severity":"INFO","time":"2021-10-22T01:29:04.805Z","correlation_id":null,"message":"Starting reindex of index_resource_milestone_events_on_merge_request_id","index":"public.index_resource_milestone_events_on_merge_request_id","table":"resource_milestone_events","estimated_bloat_bytes":24576,"index_size_before_bytes":40960,"relative_bloat_level":0.6}
{"severity":"INFO","time":"2021-10-22T01:29:04.858Z","correlation_id":null,"message":"Finished reindex of index_resource_milestone_events_on_merge_request_id","index":"public.index_resource_milestone_events_on_merge_request_id","table":"resource_milestone_events","estimated_bloat_bytes":24576,"index_size_before_bytes":40960,"index_size_after_bytes":32768,"relative_bloat_level":0.5,"duration_s":0.02}
{"severity":"INFO","time":"2021-10-22T01:29:04.883Z","correlation_id":null,"message":"Starting reindex of index_merge_requests_on_title","index":"public.index_merge_requests_on_title","table":"merge_requests","estimated_bloat_bytes":24576,"index_size_before_bytes":40960,"relative_bloat_level":0.6}
{"severity":"INFO","time":"2021-10-22T01:29:04.917Z","correlation_id":null,"message":"Finished reindex of index_merge_requests_on_title","index":"public.index_merge_requests_on_title","table":"merge_requests","estimated_bloat_bytes":24576,"index_size_before_bytes":40960,"index_size_after_bytes":16384,"relative_bloat_level":0.0,"duration_s":0.01}

Multiple databases

Setup your environment for multiple databases as described in https://docs.gitlab.com/ee/development/database/multiple_databases.html#configdatabaseyml.

Load the schema, e.g. gdk psql -d my_ci_database < db/structure.sql.

To create some index with bloat, execute the following on the ci database:

TRUNCATE postgres_reindex_actions;
DROP TABLE IF EXISTS ci_test;
CREATE TABLE ci_test as SELECT x, md5(random()::text) as y FROM generate_Series(1, 1000000) x;
CREATE INDEX ON ci_test (x);
DELETE FROM ci_test WHERE x % 3 = 0;
ANALYZE ci_test;

Now if we run bundle exec rails gitlab:db:reindex we should see something like this in the logs

{"severity":"DEBUG","time":"2021-10-22T01:36:18.008Z","correlation_id":null,"message":"Switched database connection","connection_name":"main"}
{"severity":"INFO","time":"2021-10-22T01:36:57.454Z","correlation_id":null,"message":"Starting reindex of index_ci_builds_on_token_encrypted","index":"public.index_ci_builds_on_token_encrypted","table":"ci_builds","estimated_bloat_bytes":98304,"index_size_before_bytes":163840,"relative_bloat_level":0.6}
{"severity":"INFO","time":"2021-10-22T01:36:57.517Z","correlation_id":null,"message":"Finished reindex of index_ci_builds_on_token_encrypted","index":"public.index_ci_builds_on_token_encrypted","table":"ci_builds","estimated_bloat_bytes":98304,"index_size_before_bytes":163840,"index_size_after_bytes":81920,"relative_bloat_level":0.2,"duration_s":0.03}
{"severity":"INFO","time":"2021-10-22T01:36:57.543Z","correlation_id":null,"message":"Starting reindex of index_issues_on_author_id_and_id_and_created_at","index":"public.index_issues_on_author_id_and_id_and_created_at","table":"issues","estimated_bloat_bytes":32768,"index_size_before_bytes":57344,"relative_bloat_level":0.5714285714285714}
{"severity":"INFO","time":"2021-10-22T01:36:57.589Z","correlation_id":null,"message":"Finished reindex of index_issues_on_author_id_and_id_and_created_at","index":"public.index_issues_on_author_id_and_id_and_created_at","table":"issues","estimated_bloat_bytes":32768,"index_size_before_bytes":57344,"index_size_after_bytes":32768,"relative_bloat_level":0.25,"duration_s":0.01}
{"severity":"DEBUG","time":"2021-10-22T01:36:57.607Z","correlation_id":null,"message":"Switched database connection","connection_name":"ci"}
{"severity":"INFO","time":"2021-10-22T01:37:25.862Z","correlation_id":null,"message":"Starting reindex of ci_test_x_idx","index":"public.ci_test_x_idx","table":"ci_test","estimated_bloat_bytes":7553024,"index_size_before_bytes":22487040,"relative_bloat_level":0.33588342440801455}
{"severity":"INFO","time":"2021-10-22T01:37:26.266Z","correlation_id":null,"message":"Finished reindex of ci_test_x_idx","index":"public.ci_test_x_idx","table":"ci_test","estimated_bloat_bytes":7553024,"index_size_before_bytes":22487040,"index_size_after_bytes":14999552,"relative_bloat_level":0.004369197160021846,"duration_s":0.38}

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #340832 (closed)

Edited by Krasimir Angelov

Merge request reports