Improve logic related to calculating bulkSize

In gitlab#211597 (closed), a customer issue was brought up:

One customer reported that they started seeing errors on their sidekiq workers after upgrading to 12.8, bulk request 2236: error: elastic: Error 413 (Request Entity Too Large)

After some investigation, the root cause appears to be the way bulkSize is calculated:

the indexer doesn't treat the MaxBulkSize as a hard limit, and will send requests that are slightly over the MaxBulkSize.

As seen here, a batch is flushed if its estimated size is greater than or equal to the bulkSize:

func (w *bulkWorker) commitRequired() bool {
	if w.bulkActions >= 0 && w.service.NumberOfActions() >= w.bulkActions {
		return true
	}
	if w.bulkSize >= 0 && w.service.EstimatedSizeInBytes() >= int64(w.bulkSize) {
		return true
	}
	return false
}

A MR was made to the Ruby BulkIndexer to prevent bulk requests from exceeding MaxBulkSize:

gitlab!31653 (merged) has been merged with a change to BulkIndexer that should help prevent 413 errors when indexing database records.

The current behaviour of BulkIndexer is to send a bulk request as soon as the request body is larger than bulk_limit_bytes. This MR proposes that a bulk request should be sent before adding a document, if adding it would result in the request size being greater than bulk_limit_bytes.

A similar change needs to be made to the golang indexer. The calculation of bulkSize is inside the olivere/elastic library, so fixing this will involve making a contribution there, and updating this project with a new release of the library.

/cc @changzhengliu

Edited by Alishan Ladhani