Skip to content

ElasticCommitIndexerWorker break indexing into smaller incremental steps

Background

Related: gitlab-com/gl-infra/production#8391 (closed)

ElasticCommitIndexerWorker runtime is very dependent on the repository size / number of commits. However, having the timeout for the ElasticCommitIndexerWorker set to 24 hours is problematic for a few reasons:

Proposal

Some ideas from the team during debugging of the related incident. We need to get a solid plan technical in place before scheduling this work:

  • Break the indexing into smaller incremental steps, so that we can effectively paginate through
  • we can schedule them in the future with sidekiq, so that we don't clog the entire queue with the single job (gives fairness in case a single repo has a ton of files)
  • I think we'd need to split it by files then. We'd generate the diff and schedule batches of files for reindexing