Skip to content

Use elasticsearch bulk indexing API for database "index" operations

GitLab has elasticsearch integration, and we often have a need to index a whole project, rather than a single document within a project. This particularly comes up during initial indexing, but also when we want to try to repair gaps in the index, or to add a new project or namespace to be indexed (with the "elasticsearch limited namespace/project" feature)

All of these operations can be localised to the "initial_index_project" operation: https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/app/services/elastic/index_record_service.rb#L36

If we could use the elasticsearch batch submit API here, rather than a large number of individual import calls, we could significantly reduce the cost of initial indexing. Since we have some reports of sidekiq jobs on a single project taking many hours to complete, I think this is very worth considering.

We already use batch submission for repository indexing, in both Ruby and in Go.

cc @smcgivern @vsizov @mdelaossa @DouweM @jramsay