Set upper limit on bulk API request size for Elasticsearch indexing
Problem
We learnt from #199887 (comment 281731320) that it's a bad idea to send many large bulk api requests concurrently. During initial indexing it's likely we're going to keep causing this to happen. In #195774 (closed) we decided to handle timeouts better by reducing bundle size if the request timed out.
But this doesn't necessarily stop us exceeding heap memory if the bulk requests are queueing while the cluster is busy. What this error indicates is that we should avoid sending massive bulk requests altogether as the consequences can be quite bad.
Solution
Similar to the proposal in #195774 (closed) we should see if it's feasible to set a limit on bulk API size so that we decrease bundle size before making the request to the cluster if the request payload exceeds 1MB in size.
To do this we construct the bulk payload and if it exceeds 1MB we stop and create a smaller payload with fewer records (100, 10, 1). Lastly when we get down to 1 record if the payload is still larger than 1MB then we should truncate the fields which are too large in the payload.