Skip to content

Implement loose foreign key cleanup worker

What does this MR do?

This MR implements the record cleanup for loose foreign keys.

Configuration Example

To use the feature, the parent table needs to have the loose FK trigger. This can be added via a standard DB migration.

  include Gitlab::Database::MigrationHelpers::LooseForeignKeyHelpers

  def up
    track_record_deletions(:projects)

    remove_foreign_key_if_exists(:ci_variables, :projects, name: "fk_ada5eb64b3")
  end

  def down
    untrack_record_deletions(:projects)

    add_concurrent_foreign_key(:ci_variables, :projects, name: "fk_ada5eb64b3", column: :project_id, target_column: :id, on_delete: "cascade")
  end

For the parent model (base class) we must define the loose FK relations:

class Project < ...
  loose_foreign_key :issues, :project_id, on_delete: :delete
end

How to test it?

Create a migration to remove a foreign key:

  include Gitlab::Database::MigrationHelpers::LooseForeignKeyHelpers

  def up
    track_record_deletions(:projects)

    remove_foreign_key_if_exists(:ci_variables, :projects, name: "fk_ada5eb64b3")
  end

  def down
    untrack_record_deletions(:projects)

    add_concurrent_foreign_key(:ci_variables, :projects, name: "fk_ada5eb64b3", column: :project_id, target_column: :id, on_delete: "cascade")
  end

Set up the loose foreign key in the Project model:

include LooseForeignKey

loose_foreign_key :ci_variables, :project_id, on_delete: :async_delete
  1. Create some Ci::Variable records: 10.times { FactoryBot.create(:ci_variable) }
  2. Delete the associated projects: Ci::Variable.all.map(&:project).each(&:delete)
  3. Assert that the Ci::Variable records are still there: Ci::Variable.count
  4. Assert that the DeletedRecord model has the project primary key values: LooseForeignKeys::DeletedRecord.all
  5. Invoke the cleanup worker: LooseForeignKeys::CleanupWorker.new.perform
  6. Assert that there are no more Ci::Variable records: Ci::Variable.count
  7. Assert that there are no more DeletedRecord records: LooseForeignKeys::DeletedRecord.count

How does it work?

The track_record_deletions installs an ON DELETE trigger to the projects table. When a record is removed, its primary key and table id are going to be recorded in the loose_foreign_keys_deleted_records partitioned table.

The loose_foreign_keys_deleted_records table has the following columns:

  • fully_qualified_table_name: name of the deleted table (including the schema).
  • primary_key_value: the primary key of the deleted project record.
  • id: primary key
  • status: pending, processed
  • partition: for the list partitioning
  • created_at: insert timestamp, for debugging purposes

Record cleanup execution steps: LooseForeignKey::CleanupWorker

Note: This job is triggered periodically.

  1. Load BATCH_SIZE deleted records for each configured table, ordered by id ASC.
  2. For each chunk, invoke the LooseForeignKeys::BatchCleanerService service.

Limits and security measures

  • The feature is behind a feature flag: loose_foreign_key_cleanup. If the flag is disabled, the worker will not delete more records.
  • The number of record modifications are limited, after reaching the limit the cleanup stops.
  • The time period where we run the queries is limited to 3 minutes. If we schedule this job for every 5 minutes, then we give time to the autovacuum process to do its work.
  • The number of records we delete or update are batched. To avoid waiting for record locks, the queries are executed with SKIP LOCKED. After the remaining record count reaches 0, we attempt running the same query without SKIP LOCKED.
  • For the generated DELETE or UPDATE queries we check if the foreign key condition is present before execution.

Testing on DB lab

I did a short test on DB lab where I removed the following foreign keys:

    remove_foreign_key_if_exists(:ci_variables, :projects, name: "fk_ada5eb64b3")
    remove_foreign_key_if_exists(:ci_group_variables, :namespaces, name: "fk_33ae4d58d8")
    remove_foreign_key_if_exists(:ci_unit_tests, :projects, name: "fk_7a8fabf0a8")
    remove_foreign_key_if_exists(:merge_request_metrics, :ci_pipelines, name: "fk_rails_33ae169d48")
    remove_foreign_key_if_exists(:merge_requests, :ci_pipelines, name: "fk_fd82eae0b9")
    remove_foreign_key_if_exists(:vulnerability_feedback, :ci_pipelines, name: "fk_rails_20976e6fd9")

I deleted a few hundreds of random rows to populate the loose_foreign_keys_deleted_records table and then I invoked the CleanupWorker.

The delete and update queries are pretty quick <100ms, there are spikes from time to time >5s which I often see on DB-lab when doing data manipulation queries.

On production

After deployment, the CleanupWorker will be scheduled to run every 5 minutes. This will be basically a no-op since it's gated by the feature flag.

If the feature flag is enabled, deletion will still not happen because we're not tracking any tables yet.

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Related to #338535 (closed)

Edited by Adam Hegyi

Merge request reports