Follow-up from "Fix recording audit event for status check response changes"
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
The following discussion from !201308 (merged) should be addressed:
-
@DylanGriffith started a discussion: Suggestion (non-blocking): It looks like this batching strategy is not optimal. Per https://docs.gitlab.com/development/database/batched_background_migrations/#using-a-composite-or-partial-index-to-iterate-a-subset-of-the-table we need to have indexes that exactly match all the where clauses before
each_batch. In this case it seemed like it previously almost matched:"idx_status_check_responses_on_id_and_status" btree (id, status)But even that wasn't quite right because it would have needed the index to be
(status, id)for the previous query to actually make use of efficient pagination.Given the query plan in https://console.postgres.ai/gitlab/gitlab-production-main/sessions/42438/commands/129933 we can see the index scan is not very efficient and it basically needs to scan the whole table.
In this case it worked out OK because there's not that much data in this table. It's still not cheap though because that's loading
~86.00 MiBof data from disk without finding a single record that matches.In this implementation the performance of this pagination will get worse over time. And the use of
each_batchwon't give us stable batch size performance due to missing an index to match the where clause.For this reason it's usually more efficient to only have the
each_batchon the exactly indexed columns. In this case we only haveidand as such we might get better performance with:::MergeRequests::StatusCheckResponse.each_batch do |batch| record_ids = batch.pending.timeout_eligible.pluck_primary_key batch.update_all(status: 'failed') -
@DylanGriffith started a discussion: Nitpick (non-blocking): If we're already getting a batch of ids here we probably don't need to use
find_each. And if we were sure we wanted to deal with batches then we're probably better off usingeach_batchinstead offind_each. But in this case the full list of ids are already in memory so I think it's safe to assume it's never going to be too large and we can just useeach. Usingfind_eachdoes add some overhead if it ends up splitting up the groups and the queries aren't optimized likeeach_batch.