Production data cleanup process
Problem
In this Slack thread, it is mentioned that some Production data is quite suspicious and/or fake.
We have some processes to deal with data deletion. This issue aims to complete this process so that this data which was recently discovered can be dealt with.
Proposal
Two aspects for this issue to consider:
Customer
object deletion process
Leverage existing We already have some resources that can be leveraged to delete Customer
objects:
-
Privacy Compliance - Deletion Request (Customers Portal Only) - checklist to follow to delete a
Customer
object,
Customer
object deletion has been triggered because of received GDPR requests (see "Account Deletion and Other Requests" issues related to GDPR requests).
We need to be able to file another type of request, similar to a GDPR request except that it would be a "cleanup data" request
Use of a worker
The proposal above implies a manual process, which is necessary to have in place but is tedious to implement for all existing data eligible for deletion.
A worker could be set up for automatic deletion of "fake" Customer
objects. The difficulty is to determine against what data this worker should run against, eg. what specific data flags a Customer
object as being fake.
There are a few rules that would need to be implemented to make this worker effective. For example, if the substring <script
is present in the first_name
or last_name
columns, the worker would flag the record for deletion.
The Data team or the Support team might have some insight on the list of rules to apply.
Once this worker is ready (ie. has a minimum set of rules), then it can be added in the CustomersDot cron schedule.