Archive old authentication_events
What does this MR do and why?
Enforces a 1 year retention policy on records in the the authentication_events
table. This table is currently over the large table limit, and a 1 year retention policy will bring it well under the limit.
What
Enforce the 1 year retention policy by:
- Adding
authentication_event_archived_records
table and basic model- The table has no FKs and minimal constraints because it is meant to be a simple dumping ground for data
- The Rails model has no real functionality since this model won't be used throughout the application, it will just be used in one data migration
- Adding
ArchiveAuthenticationEvents
batched background migration.- Records created more than 1 year ago in
authentication_events
are copied toauthentication_events_archived_records
and then deleted fromauthentication_events
- In this way, we soft delete records in the archive.
- Records created more than 1 year ago in
- Enqueue the BBM only for gitlab.com. We will enqueue for self-managed in a follow-up, after we've confirmed the BBM successfully ran for .com.
- We'll drop the
authentication_events_archived_records
table in a few milestones once we've confirmed the retention policy is being enforced with no issues
Data
The migration will delete 77.1% of rows from this table. See this comment for more details.
Why?
authentication_events
is now over the large table limit. This work supports pruning the operational table and thus getting it back under the table limit.
References
See this issue for detailed planning discussion, where multiple approaches are considered: #555382
Issue: #545007
Is this reversible?
- The migration to queue the batched background migration is reversible.
- The operations performed by the batched background migration are not automatically reversible.
- The BBM is deleting records from
authentication_events
and inserting them intoauthentication_event_archived_records
- The
DELETE
+INSERT
is atomic so every row deleted fromauthentication_events
has a copy stored inauthentication_event_archived_records
- To restore deleted records, we'd want to run queries to transfer records in the opposite direction, from
authentication_event_archived_records
toauthentication_events
.- Likely as its own BBM. Or spot transfer via console.
- The
- The BBM is deleting records from
For example, this query will restore 3 records from the archive:
INSERT INTO authentication_events
(id, created_at, user_id, result, ip_address, provider, user_name)
SELECT
id, created_at, user_id, result, ip_address, provider, user_name
FROM authentication_event_archived_records archive WHERE archive.id IN (1, 2, 3)
Possible impact of error?
No critical flows would be blocked in case of error. Any errors would largely result in under-reporting data:
- Show fewer auth logs to users than we should
- Consider fewer auth log entries in "most recent IP address" calculation, causing e-mail verification to trigger more frequently than normal
- "Most used IP address" on abuse reports detail page considers fewer IPs than normal
All these impacts would be fixed as soon as deleted records are restored from archive.
Screenshots or screen recordings
Batched background migration queries
Queries for 1 batch run, from rspec output (see this comment for details):
SELECT "authentication_events"."id" FROM "authentication_events" WHERE "authentication_events"."id" BETWEEN 1 AND 3 ORDER BY "authentication_events"."id" ASC LIMIT 1
-- sub-batch 1
SELECT "authentication_events"."id" FROM "authentication_events" WHERE "authentication_events"."id" BETWEEN 1 AND 3 AND "authentication_events"."id" >= 1 ORDER BY "authentication_events"."id" ASC LIMIT 1 OFFSET 2
WITH deleted_records AS (
DELETE FROM authentication_events
WHERE id IN (SELECT "authentication_events"."id" FROM "authentication_events" WHERE "authentication_events"."id" BETWEEN 1 AND 3 AND "authentication_events"."id" >= 1 AND "authentication_events"."id" < 3 AND "authentication_events"."created_at" <= '2023-01-01 00:00:00' LIMIT 2)
RETURNING *
)
INSERT INTO authentication_event_archived_records (SELECT * FROM deleted_records)
-- sub-batch 2
SELECT "authentication_events"."id" FROM "authentication_events" WHERE "authentication_events"."id" BETWEEN 1 AND 3 AND "authentication_events"."id" >= 3 ORDER BY "authentication_events"."id" ASC LIMIT 1 OFFSET 2 /*application:test,correlation_id:fd560589b624884ddf7351e28075a1f1,db_config_database:gitlabhq_test,db_config_name:main,line:/app/models/concerns/each_batch.rb:84:in `block in each_batch'*/
WITH deleted_records AS (
DELETE FROM authentication_events
WHERE id IN (SELECT "authentication_events"."id" FROM "authentication_events" WHERE "authentication_events"."id" BETWEEN 1 AND 3 AND "authentication_events"."id" >= 3 AND "authentication_events"."created_at" <= '2023-01-01 00:00:00' LIMIT 2)
RETURNING *
)
INSERT INTO authentication_event_archived_records (SELECT * FROM deleted_records)
Query plans
Batched selects
Query plans
explain SELECT authentication_events.id FROM authentication_events WHERE authentication_events.id BETWEEN 20069040 AND 20757411 AND authentication_events.id >= 20069040 ORDER BY authentication_events.id ASC LIMIT 1 OFFSET 1000
Limit (cost=27.92..27.94 rows=1 width=8) (actual time=3.047..3.048 rows=1 loops=1)
Buffers: shared hit=14 read=13
I/O Timings: read=2.791 write=0.000
-> Index Only Scan using authentication_events_pkey on public.authentication_events (cost=0.57..18620.51 rows=680929 width=8) (actual time=1.004..3.016 rows=1001 loops=1)
Index Cond: ((authentication_events.id >= 2
Time: 4.032 ms
- planning: 0.951 ms
- execution: 3.081 ms
- I/O read: 2.791 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 14 (~112.00 KiB) from the buffer pool
- reads: 13 (~104.00 KiB) from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
explain SELECT authentication_events.* FROM authentication_events WHERE authentication_events.id BETWEEN 20069040 AND 20757411 AND authentication_events.id >= 20069040 AND authentication_events.id < 20070040 AND authentication_events.created_at <= '2024-08-27 00:00:00'
Index Scan using authentication_events_pkey on public.authentication_events (cost=0.57..86.67 rows=663 width=60) (actual time=0.876..72.699 rows=1000 loops=1)
Index Cond: ((authentication_events.id >= 20069040) AND (authentication_events.id <= 20757411) AND (authentication_events.id >= 20069040) AND (authentication_events.id < 20070040))
Filter: (authentication_events.created_at <= '2024-08-27 00:00:00+00'::timestamp with time zone)
Rows Removed by Filter: 0
Buffers: shared hit=879 read=123
I/O Timings: read=70.453 write=0.000
Settings: seq_page_cost = '4', work_mem = '100MB', effective_cache_size = '472585MB', jit = 'off', random_page_cost = '1.5'
Time: 73.986 ms
- planning: 1.140 ms
- execution: 72.846 ms
- I/O read: 70.453 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 879 (~6.90 MiB) from the buffer pool
- reads: 123 (~984.00 KiB) from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
DELETE
+INSERT
CTE Query plan
explain WITH deleted_records AS (
DELETE FROM authentication_events
WHERE id IN (SELECT "authentication_events"."id" FROM "authentication_events" WHERE "authentication_events"."id" BETWEEN 20069040 AND 20757411 AND "authentication_events"."id" >= 20069040 AND "authentication_events"."id" < 20070040 AND "authentication_events"."created_at" <= '2024-08-27 00:00:00' LIMIT 1000)
RETURNING *
)
INSERT INTO authentication_event_archived_records
(id, created_at, user_id, result, ip_address, provider, user_name, archived_at)
SELECT
id, created_at, user_id, result, ip_address, provider, user_name, CURRENT_TIMESTAMP as archived_at
FROM deleted_records
ModifyTable on public.authentication_event_archived_records (cost=3025.81..3043.97 rows=0 width=0) (actual time=111.700..111.703 rows=0 loops=1)
Buffers: shared hit=9019 read=134 dirtied=143 written=16
WAL: records=2001 fpi=124 bytes=1191788
I/O Timings: read=99.223 write=0.364
CTE deleted_records
-> ModifyTable on public.authentication_events (cost=121.18..3025.81 rows=807 width=38) (actual time=100.768..108.690 rows=1000 loops=1)
Buffers: shared hit=7991 read=134 dirtied=129
WAL: records=1000 fpi=123 bytes=1056683
I/O Timings: read=99.223 write=0.000
-> Nested Loop (cost=121.18..3025.81 rows=807 width=38) (actual time=100.088..103.170 rows=1000 loops=1)
Buffers: shared hit=5874 read=128
I/O Timings: read=95.678 write=0.000
-> HashAggregate (cost=120.61..128.68 rows=807 width=40) (actual time=100.059..100.335 rows=1000 loops=1)
Group Key: "ANY_subquery".id
Buffers: shared hit=874 read=128
I/O Timings: read=95.678 write=0.000
-> Subquery Scan on ANY_subquery (cost=0.57..118.59 rows=807 width=40) (actual time=5.359..99.063 rows=1000 loops=1)
Buffers: shared hit=874 read=128
I/O Timings: read=95.678 write=0.000
-> Limit (cost=0.57..110.52 rows=807 width=8) (actual time=5.340..98.506 rows=1000 loops=1)
Buffers: shared hit=874 read=128
I/O Timings: read=95.678 write=0.000
-> Index Scan using authentication_events_pkey on public.authentication_events authentication_events_1 (cost=0.57..110.52 rows=807 width=8) (actual time=5.337..98.231 rows=1000 loops=1)
Index Cond: ((authentication_events_1.id >= 20069040) AND (authentication_events_1.id <= 20757411) AND (authentication_events_1.id >= 20069040) AND (authentication_events_1.id < 20070040))
Filter: (authentication_events_1.created_at <= '2024-08-27 00:00:00+00'::timestamp with time zone)
Rows Removed by Filter: 0
Buffers: shared hit=874 read=128
I/O Timings: read=95.678 write=0.000
-> Index Scan using authentication_events_pkey on public.authentication_events (cost=0.57..3.59 rows=1 width=14) (actual time=0.002..0.002 rows=1 loops=1000)
Index Cond: (authentication_events.id = "ANY_subquery".id)
Buffers: shared hit=5000
I/O Timings: read=0.000 write=0.000
-> CTE Scan on deleted_records (cost=0.00..18.16 rows=807 width=130) (actual time=100.777..109.522 rows=1000 loops=1)
Buffers: shared hit=7991 read=134 dirtied=129
WAL: records=1000 fpi=123 bytes=1056683
I/O Timings: read=99.223 write=0.000
Settings: effective_cache_size = '472585MB', jit = 'off', work_mem = '100MB', random_page_cost = '1.5', seq_page_cost = '4'
Time: 113.705 ms
- planning: 1.782 ms
- execution: 111.923 ms
- I/O read: 99.223 ms
- I/O write: 0.364 ms
Shared buffers:
- hits: 9019 (~70.50 MiB) from the buffer pool
- reads: 134 (~1.00 MiB) from the OS file cache, including disk I/O
- dirtied: 143 (~1.10 MiB)
- writes: 16 (~128.00 KiB)
Compare operational and archive table schemas:
`authentication_events` schema
-- structure.sql
CREATE TABLE authentication_events (
id bigint NOT NULL,
created_at timestamp with time zone NOT NULL,
user_id bigint,
result smallint NOT NULL,
ip_address inet,
provider text NOT NULL,
user_name text NOT NULL,
CONSTRAINT check_45a6cc4e80 CHECK ((char_length(user_name) <= 255)),
CONSTRAINT check_c64f424630 CHECK ((char_length(provider) <= 64))
);
-- psql \d+ authentication_events
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
------------+--------------------------+-----------+----------+---------------------------------------------------+----------+-------------+--------------+-------------
id | bigint | | not null | nextval('authentication_events_id_seq'::regclass) | plain | | |
created_at | timestamp with time zone | | not null | | plain | | |
user_id | bigint | | | | plain | | |
result | smallint | | not null | | plain | | |
ip_address | inet | | | | main | | |
provider | text | | not null | | extended | | |
user_name | text | | not null | | extended | | |
Indexes:
"authentication_events_pkey" PRIMARY KEY, btree (id)
"index_authentication_events_on_created_at" btree (created_at)
"index_authentication_events_on_provider" btree (provider)
"index_authentication_events_on_user_and_ip_address_and_result" btree (user_id, ip_address, result)
"index_successful_authentication_events_for_metrics" btree (user_id, provider, created_at) WHERE result = 1
Check constraints:
"check_45a6cc4e80" CHECK (char_length(user_name) <= 255)
"check_c64f424630" CHECK (char_length(provider) <= 64)
Foreign-key constraints:
"fk_rails_b204656a54" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL
Access method: heap
`authentication_event_archived_records` schema
-- structure.sql
CREATE TABLE authentication_event_archived_records (
id bigint NOT NULL,
created_at timestamp with time zone NOT NULL,
user_id bigint,
result smallint NOT NULL,
ip_address inet,
provider text NOT NULL,
user_name text NOT NULL,
archived_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT check_0a55f02112 CHECK ((char_length(provider) <= 64)),
CONSTRAINT check_3b5f782c5d CHECK ((char_length(user_name) <= 255))
);
-- psql \d+ authentication_events_archived_records
Table "public.authentication_event_archived_records"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
-------------+--------------------------+-----------+----------+-------------------+----------+-------------+--------------+-------------
id | bigint | | not null | | plain | | |
created_at | timestamp with time zone | | not null | | plain | | |
user_id | bigint | | | | plain | | |
result | smallint | | not null | | plain | | |
ip_address | inet | | | | main | | |
provider | text | | not null | | extended | | |
user_name | text | | not null | | extended | | |
archived_at | timestamp with time zone | | | CURRENT_TIMESTAMP | plain | | |
Indexes:
"authentication_event_archived_records_pkey" PRIMARY KEY, btree (id)
Check constraints:
"check_0a55f02112" CHECK (char_length(provider) <= 64)
"check_3b5f782c5d" CHECK (char_length(user_name) <= 255)
Access method: heap
How to set up and validate locally
Run migrations, run tests.
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Related to #545007