Make maximum elasticsearch batch API request sizes configurable
Problem to solve
Elasticsearch clusters have a "batch API", which is used extensively when submitting repository data. Clusters are configured to have a maximum request size which varies by cluster. In particular, in AWS, smaller ES instances are configured to have a maximum batch size of 10MiB, while larger instances are configured to have a maximum batch size of 100MiB.
Intended users
Instance administrators
Further details
Larger batch sizes lead to more-efficient repository indexing. Currently, we hardcode to 10MiB batches, but only because of the AWS limitation.
Proposal
Add a setting to configure the maximum batch request size in the elasticsearch section of the admin panel. This can be passed down to the gitlab-elasticsearch-indexer
process in the ELASTIC_CONNECTION_INFO
envvar, overriding the currently hardcoded numbers here: https://gitlab.com/gitlab-org/gitlab-elasticsearch-indexer/blob/master/elastic/client.go#L28
I don't think it's introspectable from the elastic cluster. If it were, we could autodetect instead of making it configurable.
People can set up their own elasticsearch clusters with arbitrary limits, so I don't think limiting to the AWS tiers makes much sense. We should probably default to 10MiB so we work in the broadest range of cases though.
Permissions and Security
Only for instance admins
Documentation
We'll need to update https://docs.gitlab.com/ee/integration/elasticsearch.html
Testing
Unit testing in both the gitlab-ee and gitlab-elasticsearch-indexer projects
What does success look like, and how can we measure that?
We can index projects against large AWS indexes more efficiently