Improve logic related to calculating bulkSize
In gitlab#211597 (closed), a customer issue was brought up:
One customer reported that they started seeing errors on their sidekiq workers after upgrading to 12.8,
bulk request 2236: error: elastic: Error 413 (Request Entity Too Large)
After some investigation, the root cause appears to be the way bulkSize is calculated:
the indexer doesn't treat the
MaxBulkSizeas a hard limit, and will send requests that are slightly over theMaxBulkSize.As seen here, a batch is flushed if its estimated size is greater than or equal to the
bulkSize:func (w *bulkWorker) commitRequired() bool { if w.bulkActions >= 0 && w.service.NumberOfActions() >= w.bulkActions { return true } if w.bulkSize >= 0 && w.service.EstimatedSizeInBytes() >= int64(w.bulkSize) { return true } return false }
A MR was made to the Ruby BulkIndexer to prevent bulk requests from exceeding MaxBulkSize:
gitlab!31653 (merged) has been merged with a change to
BulkIndexerthat should help prevent 413 errors when indexing database records.
The current behaviour of
BulkIndexeris to send a bulk request as soon as the request body is larger thanbulk_limit_bytes. This MR proposes that a bulk request should be sent before adding a document, if adding it would result in the request size being greater thanbulk_limit_bytes.
A similar change needs to be made to the golang indexer. The calculation of bulkSize is inside the olivere/elastic library, so fixing this will involve making a contribution there, and updating this project with a new release of the library.
/cc @changzhengliu