Skip to content

Clean up serialized objects in audit_events

Tan Le requested to merge tancnle/fix-ruby-object-audit-events-migrations into master

What does this MR do?

Clean up serialized objects on details field in audit_events table.

Relates to #216577 (closed)

Database benchmarks

Number of rows effected

6,638,795 (via #database-lab on 2020/06/03)

Index Only Scan using index_audit_events_on_ruby_object_in_details_2 on public.audit_events  (cost=0.43..175497.02 rows=8788533 width=4) (actual time=1.215..11927.867 rows=9068244 loops=1)
  Heap Fetches: 0
  Buffers: shared hit=15746 read=24805
  I/O Timings: read=10497.578

Query plans

Select without index
EXPLAIN SELECT id FROM audit_events WHERE details ~~ '%ruby/object%'

Time: 25.316 min
  - planning: 5.909 ms
  - execution: 25.316 min
    - I/O read: 5.354 min
    - I/O write: 7.269 s

Shared buffers:
  - hits: 1019 (~8.00 MiB) from the buffer pool
  - reads: 9975795 (~76.10 GiB) from the OS file cache, including disk I/O
  - dirtied: 111574 (~871.70 MiB)
  - writes: 111542 (~871.40 MiB)

https://explain.depesz.com/s/r69vd

Create index
CREATE INDEX CONCURRENTLY index_audit_events_on_ruby_object_in_details
ON audit_events
USING btree (id)
WHERE (details ~~ '%ruby/object%')

The query has been executed. Duration: 52.729 min
List of relations
 Schema |                     Name                     | Type  | Owner  |    Table     |  Size  | Description 
--------+----------------------------------------------+-------+--------+--------------+--------+-------------
 public | index_audit_events_on_ruby_object_in_details | index | gitlab | audit_events | 194 MB | 
(1 row)

Schedule with index

Retrieve IDs in batch of 1000
EXPLAIN SELECT "audit_events"."id"
FROM "audit_events"
WHERE "audit_events"."id" >= 1
AND (details ~~ '%ruby/object%')
ORDER BY "audit_events"."id" ASC
LIMIT 1 OFFSET 1000

Time: 9.992 ms
  - planning: 4.078 ms
  - execution: 5.914 ms
    - I/O read: 5.484 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 3 (~24.00 KiB) from the buffer pool
  - reads: 5 (~40.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

https://explain.dalibo.com/plan/bkY

Get Min/Max IDs
EXPLAIN SELECT MIN(id), MAX(id)
FROM "audit_events"
WHERE "audit_events"."id" >= 274591324 AND "audit_events"."id" < 274592324
AND (details ~~ '%ruby/object%')

Time: 15.392 ms
  - planning: 15.025 ms
  - execution: 0.367 ms
    - I/O read: 0.221 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 3 (~24.00 KiB) from the buffer pool
  - reads: 3 (~24.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

https://explain.dalibo.com/plan/gFy

Update (in batches of 1,000)
EXPLAIN UPDATE audit_events
SET details = regexp_replace(details, '!ruby/object.*name: ', '')
WHERE id BETWEEN 274591324 AND 274592324
AND (details ~~ '%ruby/object%')

Time: 17.545 ms
  - planning: 5.525 ms
  - execution: 12.020 ms
    - I/O read: 11.882 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 0 from the buffer pool
  - reads: 3 (~24.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

https://explain.dalibo.com/plan/UiL

Migrations runtime

  • Index: 60 mins
  • Schedule: 2 secs
  • Update: <10 days with 1K batches

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by Tan Le

Merge request reports