Add Sidekiq cron job to clean up test data on GitLab stg
Problem
In a recent triage issue @niskhakova discovered we had over ~1 million test user accounts piling up on staging.
We currently have QA::Tools module for test data cleanup via API calls, which I think is normally scheduled to run in a delete resource pipeline(?). This normally would cover most scenarios, but there are also resources that cannot be removed via APIs (e.g. a top-level group with paid subscription).
Proposal
Create sidekiq schedule to run a test data cleanup job on GitLab stg.
1st iteration
In CustomersDot, there is a sidekiq cron job for test data cleanup to run on staging every 6 hours, also a rake task to run in staging console on demand.
-
Add a sidekiqworker to remove top-level test groups created infulfillmenttests suite.
2nd iteration
- In the long run maybe we can convert some of the
QA:Toolsfunction to thesidekiqjob🤔 Compared to current cleanup pipeline via APIs👇
Pros
- directly removing records from db, less expensive than API calls
- bypass deletion validations, more flexibilities
- more pipeline efficiency and requires minimal manual actions.
- having
sidekiqhandling logs and retries
Cons
- requires
sidekiqworker, should be queued as low priority so it doesn't impact other application workers - difficult to handle certain reusable test resource and resource that's left for debugging purpose
Edited by Chloe Liu