Change AuthenticationEventArchiveWorker deletion strategy from soft deletion to hard deletion
What does this MR do and why?
Previous context
- In !207500 (merged) we introduced a background worker that enforces a data retention policy on
authentication_events - The worker runs once a day
- The worker stores old rows in an archive table (
authentication_event_archived_records) and then deletes the old rows fromauthentication_events - The archival was temporary to enable soft deleting records
- We've verified no negative impact from deleting
authentication_eventsand no longer need the archive table
This MR
-
Change the
AuthenticationEventArchiveWorkerworker to hard delete records instead of soft delete them -
With this change the job no longer references
authentication_event_archived_records, which gets us closer to being able to drop the archive table. -
The worker is scheduled and runs via cron once a day
-
The worker gets enqueued in all environments but the logic is gated behind a feature flag check, so in practice the job only deletes records in the GitLab.com environment
-
Therefore this change is only expected to impact GitLab.com and not other users
References
- #562154
- Original archiving issue: #545007 (closed)
SQL Queries
Set up outer batch
SELECT "authentication_events"."id" FROM "authentication_events" WHERE (id > 0) ORDER BY "authentication_events"."id" ASC LIMIT 1
SELECT "authentication_events"."id" FROM "authentication_events" WHERE (id > 0) AND "authentication_events"."id" >= 56100052 ORDER BY "authentication_events"."id" ASC LIMIT 1 OFFSET 10000
Get last ID in the batch (needed for job re-enqueue)
SELECT "authentication_events".* FROM "authentication_events" WHERE (id > 0) AND "authentication_events"."id" >= 56100052 AND "authentication_events"."id" < 56110052 ORDER BY "authentication_events"."id" DESC LIMIT 1
Set up inner batch
SELECT "authentication_events"."id" FROM "authentication_events" WHERE (id > 0) AND "authentication_events"."id" >= 56100052 AND "authentication_events"."id" < 56110052 ORDER BY "authentication_events"."id" ASC LIMIT 1
SELECT "authentication_events"."id" FROM "authentication_events" WHERE (id > 0) AND "authentication_events"."id" >= 56100052 AND "authentication_events"."id" < 56110052 AND "authentication_events"."id" >= 56100052 ORDER BY "authentication_events"."id" ASC LIMIT 1 OFFSET 1000
Check existence before deleting
SELECT 1 AS one FROM "authentication_events" WHERE ... LIMIT 1
Delete rows
WITH batch AS MATERIALIZED (
SELECT "authentication_events"."id", "authentication_events"."created_at" FROM "authentication_events" WHERE (id > 0) AND "authentication_events"."id" >= 56100052 AND "authentication_events"."id" < 56110052 AND "authentication_events"."id" >= 56100052 AND "authentication_events"."id" < 56101052 LIMIT 1000
),
filtered_batch AS MATERIALIZED (
SELECT id, created_at FROM batch
WHERE created_at <= '2024-11-19 00:00:00' LIMIT 1000
)
DELETE FROM authentication_events
WHERE id IN (SELECT id FROM filtered_batch)
Get next inner batch (repeats...)
SELECT "authentication_events"."id" FROM "authentication_events" WHERE (id > 0) AND "authentication_events"."id" >= 56100052 AND "authentication_events"."id" < 56110052 AND "authentication_events"."id" >= 56101052 ORDER BY "authentication_events"."id" ASC LIMIT 1 OFFSET 1000
CTE Query Plan
- Run of the new CTE: https://console.postgres.ai/gitlab/gitlab-production-main/sessions/45672/commands/139965
- Comparable plan from old CTE (which both inserted and deleted records): https://console.postgres.ai/gitlab/gitlab-production-main/sessions/44381/commands/136065
Screenshots or screen recordings
How to set up and validate locally
Instructions from the original MR: !207500 (merged)
- Seed some
authentication_eventsrecords. Try a mix of records over a year old and under a year old. This handy seed script will do this for you. - Run the worker via Rails console:
Authn::DataRetention::AuthenticationEventArchiveWorker.new.perform
- Observe the worker didn't delete anything, because the feature flag is disabled.
- Enable the feature flag via Rails console:
Feature.enable(:archive_authentication_events)
- Re-enqueue the job via Rails console:
Authn::DataRetention::AuthenticationEventArchiveWorker.new.perform
- Observe records older than 1 year are deleted, and records newer than 1 year are not. You can do this with a psql query:
gdk psql
SELECT * FROM authentication_events WHERE created_at < (now() - interval '1 year') -- records out of retention, will decrease as the job runs
SELECT * FROM authentication_events WHERE created_at >= (now() - interval '1 year') -- records in retention, should not change
Generate seed data
To generate seed data, you can use this script:
Create file `lib/tasks/seed_authn_events.rake`
# frozen_string_literal: true
namespace :gitlab do
namespace :seed do
desc 'Seed a whole lot of authentication_events'
task :authentication_events, [:num_batches, :batch_size] => :gitlab_environment do |_t, args|
username = 'root'
user = User.find_by(username: username)
last_user_id = User.maximum(:id)
num_batches = (args[:num_batches] || 10).to_i
batch_size = (args[:batch_size] || 100).to_i
num_processes = 12
puts "Creating #{num_batches} batches of #{batch_size} authentication events each using #{num_processes} processes."
# Distribute batches evenly, handling remainder
base_batches_per_process = num_batches / num_processes
remainder = num_batches % num_processes
pids = (0...num_processes).map do |i|
# Give extra batch to first 'remainder' processes
batches_for_this_process = base_batches_per_process + (i < remainder ? 1 : 0)
start_batch = i * base_batches_per_process + [i, remainder].min + 1
end_batch = start_batch + batches_for_this_process - 1
fork do
# Reconnect to database in child process
ActiveRecord::Base.establish_connection
puts "Process #{Process.pid}: handling batches #{start_batch}..#{end_batch}"
(start_batch..end_batch).each do |batch_number|
puts "Process #{Process.pid}: Starting batch #{batch_number}"
use_deleted_user = rand(100) < 5
attrs_to_insert = (1..batch_size).map do |_i|
{
user_id: use_deleted_user ? last_user_id + 1 : user.id,
user_name: user.username,
provider: 'standard',
ip_address: FFaker::Internet.ip_v4_address,
result: [:failed, :success].sample,
created_at: rand(48).months.ago
}
end
begin
ActiveRecord::Base.connection.execute("ALTER TABLE authentication_events DISABLE TRIGGER ALL")
AuthenticationEvent.insert_all(attrs_to_insert)
ensure
ActiveRecord::Base.connection.execute("ALTER TABLE authentication_events ENABLE TRIGGER ALL")
end
puts "Process #{Process.pid}: Completed batch #{batch_number}"
end
# Clean exit from child process
exit 0
end
end
# Wait for all child processes
pids.each { |pid| Process.wait(pid) }
total_records = num_batches * batch_size
puts "Successfully created #{total_records} authentication events."
end
end
end
Run it with:
bundle exec rake "gitlab:seed:authentication_events[20, 100000]"
...which would create 2,000,000 records (20 batches of 100k).
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Related to #562154