Skip to content

Add audit_events partitioning migration

Andreas Brandl requested to merge ab/partition-audit-events into master

What does this MR do?

This builds on top of !35981 (merged) and adds the migration for audit_events partitioning.

In detail, this performs the following steps:

  1. Create a new table audit_events_part_5fc467ac26 with a nearly identical schema as audit_events. This is the partitioned table.
  2. Create a couple of partitions for it
  3. Install a trigger that keeps tables in sync - in particular it mirrors inserts into audit_events to the newly created table
  4. Schedule background migration jobs to copy data from audit_events to the partitioned table

migration log

== 20200629125335 PartitionAuditEvents: migrating =============================
-- transaction_open?()
   -> 0.0000s
-- table_exists?("audit_events_part_5fc467ac26")
   -> 0.0004s
-- transaction()
-- execute("CREATE TABLE audit_events_part_5fc467ac26 (\n  LIKE audit_events INCLUDING ALL EXCLUDING INDEXES,\n  partition_key_bfab5f7bb7 timestamp without time zone NOT NULL,\n  PRIMARY KEY (id, partition_key_bfab5f7bb7)\n) PARTITION BY RANGE (partition_key_bfab5f7bb7)\n")
   -> 0.0020s
-- remove_column("audit_events_part_5fc467ac26", "created_at")
   -> 0.0011s
-- rename_column("audit_events_part_5fc467ac26", "partition_key_bfab5f7bb7", "created_at")
   -> 0.0015s
-- change_column_default("audit_events_part_5fc467ac26", "id", nil)
   -> 0.0030s
-- change_column("audit_events_part_5fc467ac26", "id", :bigint)
   -> 0.0018s
   -> 0.0249s
-- table_exists?("gitlab_partitions_dynamic.audit_events_part_5fc467ac26_000000")
   -> 0.0008s
-- execute("CREATE TABLE gitlab_partitions_dynamic.audit_events_part_5fc467ac26_000000 PARTITION OF audit_events_part_5fc467ac26\nFOR VALUES FROM (MINVALUE) TO ('2020-07-01')\n")
   -> 0.0143s
-- table_exists?("gitlab_partitions_dynamic.audit_events_part_5fc467ac26_202007")
   -> 0.0004s
-- execute("CREATE TABLE gitlab_partitions_dynamic.audit_events_part_5fc467ac26_202007 PARTITION OF audit_events_part_5fc467ac26\nFOR VALUES FROM ('2020-07-01') TO ('2020-08-01')\n")
   -> 0.0216s
-- table_exists?("gitlab_partitions_dynamic.audit_events_part_5fc467ac26_202008")
   -> 0.0005s
-- execute("CREATE TABLE gitlab_partitions_dynamic.audit_events_part_5fc467ac26_202008 PARTITION OF audit_events_part_5fc467ac26\nFOR VALUES FROM ('2020-08-01') TO ('2020-09-01')\n")
   -> 0.0130s
-- execute("CREATE  FUNCTION table_sync_function_2be879775d()\nRETURNS TRIGGER AS\n$$\nBEGIN\nIF (TG_OP = 'DELETE') THEN\n  DELETE FROM audit_events_part_5fc467ac26 where id = OLD.id;\nELSIF (TG_OP = 'UPDATE') THEN\n  UPDATE audit_events_part_5fc467ac26\n  SET author_id = NEW.author_id,\n    type = NEW.type,\n    entity_id = NEW.entity_id,\n    entity_type = NEW.entity_type,\n    details = NEW.details,\n    updated_at = NEW.updated_at,\n    ip_address = NEW.ip_address,\n    author_name = NEW.author_name,\n    created_at = NEW.created_at\n  WHERE audit_events_part_5fc467ac26.id = NEW.id;\nELSIF (TG_OP = 'INSERT') THEN\n  INSERT INTO audit_events_part_5fc467ac26 (id,\n    author_id,\n    type,\n    entity_id,\n    entity_type,\n    details,\n    updated_at,\n    ip_address,\n    author_name,\n    created_at)\n  VALUES (NEW.id,\n    NEW.author_id,\n    NEW.type,\n    NEW.entity_id,\n    NEW.entity_type,\n    NEW.details,\n    NEW.updated_at,\n    NEW.ip_address,\n    NEW.author_name,\n    NEW.created_at);\nEND IF;\nRETURN NULL;\n\nEND\n$$ LANGUAGE PLPGSQL\n")
   -> 0.0053s
-- execute("COMMENT ON FUNCTION table_sync_function_2be879775d IS 'Partitioning migration: table sync for audit_events table'")
   -> 0.0004s
-- execute("CREATE TRIGGER table_sync_trigger_ee39a25f9d\nAFTER INSERT OR UPDATE OR DELETE ON audit_events\nFOR EACH ROW\nEXECUTE PROCEDURE table_sync_function_2be879775d()\n")
   -> 0.0003s
== 20200629125335 PartitionAuditEvents: migrated (0.1913s) ====================

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

There is going to be a bit of an overhead because of the double writes to both tables. We've measured this in a benchmark to land at around 6% overhead in query timings. We don't expect a problem due to the relatively low insert rate of about 10-20 inserts per second at peak times.

Reference: #223908 (comment 379438751)

Edited by Mayra Cabrera

Merge request reports