Skip to content

Improving LFK DeletedRecords batch fetching query

Omar Qunsul requested to merge 409836-speed-up-query-using-union into master

What does this MR do and why?

This MR improves the query that loads the LooseForeignKeys::DeletedRecord query. At the moment the query is not efficient when loading the records, which causes the LooseForeignKeys::CleanupWorker to hit the 30 seconds time limit without processing many records. At the moment the deleted records in the ci database are piling up fast. We have > 30M records waiting to be processed.

Addressing: #419119 (closed)

Changelog: fixed

Background

You can read about the discussion that lead to this solution in this thread. We decided to do a UNION query that reads from each partition alone, which makes the query much faster.

Queries comparison

I have chosen to query based the records for the table public.ci_runners in these examples because it still has a lot of records in the current ci snapshot, but it's not the majority. Most of the records belong to public.p_ci_builds.

Old Query

SELECT "loose_foreign_keys_deleted_records".*, "loose_foreign_keys_deleted_records"."partition" AS partition_number FROM "loose_foreign_keys_deleted_records" WHERE "loose_foreign_keys_deleted_records"."fully_qualified_table_name" = 'public.ci_runners' AND "loose_foreign_keys_deleted_records"."status" = 1 ORDER BY "partition" ASC, "loose_foreign_keys_deleted_records"."consume_after" ASC, "loose_foreign_keys_deleted_records"."id" ASC LIMIT 1000

https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/21176/commands/69144

New Query using UNION

  1. Loading the list of the partitions

This is reading from a VIEW

SELECT "postgres_partitions".* FROM "postgres_partitions" WHERE "postgres_partitions"."parent_identifier" = 'public.loose_foreign_keys_deleted_records' ORDER BY "postgres_partitions"."name" ASC

https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/21176/commands/69137

  1. Loading the records

See the query on postgres.ai (it's long) This query was created using Rails Console connected to a postgres.ai instance. Using the the code snippet while connected to the ci database.

LooseForeignKeys::DeletedRecord.using_connection(Ci::ApplicationRecord.connection) do
  LooseForeignKeys::DeletedRecord.load_batch_for_table('public.ci_runners', 1000).to_sql
end

https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/21176/commands/69168

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #409836 (closed)

Edited by Omar Qunsul

Merge request reports