Skip to content

Add rake task for reindexing

Dylan Griffith requested to merge rake-task-for-reindexing into master

What does this MR do?

In order to roll out some index mapping changes we need to perform a reindex of all the data in the cluster. There are many ways of doing this but our current plan (being documented in gitlab-com/runbooks!2017 (closed)) involves using the reindex from remote API. This API should be more efficient than using GitLab itself to load all data and construct payloads and reindex everything in the database.

This MR introduces a Rake task to automate some of the steps in gitlab-com/runbooks!2017 (closed), in particular:

  1. Create the index in the new cluster with correct mappings etc. from GitLab's settings
  2. Set refresh_interval to -1 on the destination cluster such that we tune for indexing speed by allowing the cluster to not refresh the indices at all during the indexing process
  3. Set the number_of_replicas to 0 on the destination cluster such that we tune for indexing speed by not needing to write multiple copies of the index as writes are coming in
  4. Trigger the reindex from remote API on the destination cluster so that it starts copying the data. The rask task then returns an Elasticsearch task ID which we can use to track progress.

Screenshots

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by 🤖 GitLab Bot 🤖

Merge request reports