Skip to content

Update authentication_events retention system to handle new organization_id column

Problem

In #545007 we set up the infrastructure to enforce a 1 year retention policy on the authentication_events table. This involved a new authentication_event_archived_records table and background jobs that copy rows to the archive table.

At the time, the authentication_events table did not have an organization_id column.

#561359 is now in progress, and will add a organization_id column to the authentication_events table.

Once that is done, the archiving system for authentication_events needs to be updated to account for the new organization_id column.

Why solve it?

If the authentication_events.organization_id column is added, the following will still be true:

The purpose of the archive table is to enable restoring records back to authentication_events if necessary.

If organization_id isn't being written to authentication_event_archived_records, then we can't restore organization_id back to authentication_events from the archive. We lose the organization_id information.

Proposal

To update the archive system to account for organization_id, we need to:

  • Add authentication_event_archived_records.organization_id column
  • Change Gitlab::BackgroundMigration::ArchiveAuthenticationEvents to write organization_id to archive
  • Change Authn::DataRetention::AuthenticationEventArchiveWorker to write organization_id to archive
  • Backfill the rows in authentication_event_archived_records with organization_id values from authentication_events (can only be done after authentication_events.organization_id is itself backfilled)

Alternative

Alternatively, maybe we can stop archiving records, and transition to simply hard deleting records.

We created the archive tables and jobs to address risks surrounding deleting a large number of rows from oauth_access_tokens and related tables. We wanted to soft delete to be able to quickly restore deleted rows, in case our risk assessment was flawed and there was a problem.

Since then, we've run batched migrations to delete a majority of records from these tables, and verified there have been no problems.

Since there have been no problems, maybe we can transition the cleanup workers to only delete records instead of archiving them.

If we can do that, we can:

  • Change the archive jobs to hard delete
  • Delete the authentication_event_archived_records table
What about the table swap?

If pg_repack is not available to run on authentication_events, we will have to create a clean table and swap it with authentication_events manually in order to reclaim space to the DB cluster.

The archive tables won't help us or mitigate any risk inherent to the table swap. The clean table will only contain records that are within the 1 year retention period, i.e. only rows in the authentication_events table. So having older data in authentication_event_archived_records doesn't actually help us mitigate any risk of the table swap.

Therefore removing the archive table and jobs won't change the risk profile of the table swap.

Are there other reasons to keep archiving records?

On the other hand, if we want to keep archiving rows for some other reason, we will need to update the archive table and jobs to account for organization_id.

Edited by Jason Knabl