Skip to content

Change repository indexing to sorted sets algorithm

What does this MR do?

This MR changes the indexation strategy for repositories and wikis.

Prior to this change, each Project was indexed separately, using the gitlab-elasticsearch-indexer. With this change, we now process Projects in batches, enabling us to leverage the Elasticsearch Bulk API to the fullest.

To achieve this, we split each project indexation operation in separate queues, which are drained by a single Cron worker.

ElasticIndexBulkCronWorker is responsible for:

elastic:bulk:initial:0:zset
elastic:incremental:updates:0:zset (to be renamed)

ElasticIndexBulkBlobCronWorker is responsible for:

elastic:bulk:repository:initial:0:zet
elastic:bulk:repository:updates:0:zet
elastic:bulk:wiki:initial:0:zet
elastic:bulk:wiki:updates:0:zet

Screenshots

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Closes #205178

Edited by 🤖 GitLab Bot 🤖

Merge request reports