Skip to content

Add Sidekiq cron job to clean up test data on GitLab stg

Problem

In a recent triage issue @niskhakova discovered we had over ~1 million test user accounts piling up on staging.

We currently have QA::Tools module for test data cleanup via API calls, which I think is normally scheduled to run in a delete resource pipeline(?). This normally would cover most scenarios, but there are also resources that cannot be removed via APIs (e.g. a top-level group with paid subscription).

Proposal

Create sidekiq schedule to run a test data cleanup job on GitLab stg.

1st iteration

In CustomersDot, there is a sidekiq cron job for test data cleanup to run on staging every 6 hours, also a rake task to run in staging console on demand.

  • Add a sidekiq worker to remove top-level test groups created in fulfillment tests suite.
2nd iteration
  • In the long run maybe we can convert some of the QA:Tools function to the sidekiq job 🤔 Compared to current cleanup pipeline via APIs 👇

Pros

  1. directly removing records from db, less expensive than API calls
  2. bypass deletion validations, more flexibilities
  3. more pipeline efficiency and requires minimal manual actions.
  4. having sidekiq handling logs and retries

Cons

  1. requires sidekiq worker, should be queued as low priority so it doesn't impact other application workers
  2. difficult to handle certain reusable test resource and resource that's left for debugging purpose
Edited by Chloe Liu