Skip to content

Use BG Migration to copy historic data to a partitioned table

What does this MR do?

Related issue: #218428 (closed)

Implements logic within the partitioning migration helper to copy data into a partitioned table using a BackgroundMigration. Future writes to the original table are handled by a trigger installed on the source table as part of the migration helper, so once the BackgroundMigration finishes, the tables should remain in sync.

Approach

The BackgroundMigration does batched inserts directly from the source table into the partitioned table. Since the trigger is also copying writes into the partitioned table, a potential conflict could arise with the BackgroundMigration. For that reason a FOR UPDATE clause is used to lock the rows for writing in the source table.

Because of the write lock on the source table, the batch size should be kept reasonably small, but that must be balanced against the record count of the audit_events table. With roughly 315M rows, a batch size of 10,000, and the minimum 2 minute delay between jobs, it's roughly 44 days to complete the migration. There also would be a need to enqueue 31.5K migration jobs to handle the batches.

To help alleviate this issue, each job is enqueued with a larger range of ids but internally batches over that range. This should keep individual queries small without creating as much overhead in sidekiq. The downside here is that some sidekiq jobs could partially complete, but as a cleanup migration will already be required, that shouldn't pose a problem.

Execution

Total execution estimate
  • Audit Events table has 315_000_000M records
  • Current batch size per BG Migration is 50_000
  • Current delay between background jobs is 2 minutes

Number of batches:

315_000_000 / 50_000 = 6_300 batches

Total time:

6_300 * 2 / 1_440 = 8.75 days
Batch queries to enqueue jobs

Sample batch query to find next offset in audit_events:

explain SELECT "audit_events"."id" FROM "audit_events" WHERE "audit_events"."id" >= 250000001 ORDER BY "audit_events"."id" ASC LIMIT 1 OFFSET 50000

https://explain.depesz.com/s/dh64

Sample batch query to find batch start/stop of a batch:

explain SELECT MIN(id), MAX(id) FROM "audit_events" WHERE "audit_events"."id" >= 250000001 AND "audit_events"."id" < 250050001

https://explain.depesz.com/s/Ou5

Queries executed within each job

Sample batch query to find offset of the next sub-batch:

explain SELECT "audit_events"."id" FROM "audit_events" WHERE (id BETWEEN 250000001 AND 250050000) AND "audit_events"."id" >= 250045001 ORDER BY "audit_events"."id" ASC LIMIT 1 OFFSET 2500

https://explain.depesz.com/s/HoiT

Sample batch query to find batch start/stop of a sub-batch:

explain SELECT MIN(id), MAX(id) FROM "audit_events" WHERE (id BETWEEN 250000001 AND 250050000) AND "audit_events"."id" >= 250045001 AND "audit_events"."id" < 250047501

https://explain.depesz.com/s/9NMS

Sample sub-batch insert:

explain INSERT INTO audit_events_part (id, author_id, type, entity_id, entity_type, details, created_at, updated_at)
SELECT id, author_id, type, entity_id, entity_type, details, created_at, updated_at
FROM audit_events
WHERE id BETWEEN 250047501 AND 250050000
FOR UPDATE
ON CONFLICT (id, created_at) DO UPDATE SET author_id = EXCLUDED.author_id, type = EXCLUDED.type, entity_id = EXCLUDED.entity_id, entity_type = EXCLUDED.entity_type, details = EXCLUDED.details, updated_at = EXCLUDED.updated_at

https://explain.depesz.com/s/lZcs

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by 🤖 GitLab Bot 🤖

Merge request reports