Update authentication_events
retention system to handle new organization_id
column
Problem
In #545007 we set up the infrastructure to enforce a 1 year retention policy on the authentication_events
table. This involved a new authentication_event_archived_records
table and background jobs that copy rows to the archive table.
At the time, the authentication_events
table did not have an organization_id
column.
#561359 is now in progress, and will add a organization_id
column to the authentication_events
table.
Once that is done, the archiving system for authentication_events
needs to be updated to account for the new organization_id
column.
Why solve it?
If the authentication_events.organization_id
column is added, the following will still be true:
-
authentication_event_archived_records
table does not have anorganization_id
column -
Gitlab::BackgroundMigration::ArchiveAuthenticationEvents
archives rows by explicitly referencing column names, so won't writeorganization_id
to archive - The daily cronjob
Authn::DataRetention::AuthenticationEventArchiveWorker
similarly references column names, so won't writeorganization_id
to archive.
The purpose of the archive table is to enable restoring records back to authentication_events
if necessary.
If organization_id
isn't being written to authentication_event_archived_records
, then we can't restore organization_id
back to authentication_events
from the archive. We lose the organization_id
information.
Proposal
To update the archive system to account for organization_id
, we need to:
-
Add authentication_event_archived_records.organization_id
column -
Change Gitlab::BackgroundMigration::ArchiveAuthenticationEvents
to writeorganization_id
to archive -
Change Authn::DataRetention::AuthenticationEventArchiveWorker
to writeorganization_id
to archive -
Backfill the rows in authentication_event_archived_records
withorganization_id
values fromauthentication_events
(can only be done afterauthentication_events.organization_id
is itself backfilled)
Alternative
Alternatively, maybe we can stop archiving records, and transition to simply hard deleting records.
We created the archive tables and jobs to address risks surrounding deleting a large number of rows from oauth_access_tokens
and related tables. We wanted to soft delete to be able to quickly restore deleted rows, in case our risk assessment was flawed and there was a problem.
Since then, we've run batched migrations to delete a majority of records from these tables, and verified there have been no problems.
Since there have been no problems, maybe we can transition the cleanup workers to only delete records instead of archiving them.
If we can do that, we can:
-
Change the archive jobs to hard delete -
Delete the authentication_event_archived_records
table
What about the table swap?
If pg_repack
is not available to run on authentication_events
, we will have to create a clean table and swap it with authentication_events
manually in order to reclaim space to the DB cluster.
The archive tables won't help us or mitigate any risk inherent to the table swap. The clean table will only contain records that are within the 1 year retention period, i.e. only rows in the authentication_events
table. So having older data in authentication_event_archived_records
doesn't actually help us mitigate any risk of the table swap.
Therefore removing the archive table and jobs won't change the risk profile of the table swap.
Are there other reasons to keep archiving records?
On the other hand, if we want to keep archiving rows for some other reason, we will need to update the archive table and jobs to account for organization_id
.