Skip to content

Introduce one additional thread into bin/elastic_repo_indexer

Repository indexing consists of two distinct jobs: indexing blobs, and indexing commits. This commit introduces one thread so that the two jobs can proceed in parallel, rather than in sequence.

bin/elastic_repo_indexer uses large amounts of RAM and takes a long time to index a repository. Much of this time is spent communicating with the elasticsearch server, rather than reading and processing the git repository.

Introducing more parallelism than this is quite difficult, but this much will at least allow some work to proceed while talking to elasticsearch, reducing total runtime and RAM MiB-seconds per indexing job.

This saves 15 / 90 seconds (so 16.67%) for me locally.

Related to #1606 (closed)

Merge request reports