Skip to content

Updating partitioning value deletes data from dependent tables

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Problem statement

Consider two tables pipelines and jobs with composed primary keys and a foreign key set to cascade the updates and deletes:

Tables structure
marius=# \d+ pipelines
                                                         Table "public.pipelines"
    Column    |  Type  | Collation | Nullable |                Default                | Storage | Compression | Stats target | Description
--------------+--------+-----------+----------+---------------------------------------+---------+-------------+--------------+-------------
 id           | bigint |           | not null | nextval('pipelines_id_seq'::regclass) | plain   |             |              |
 partition_id | bigint |           | not null |                                       | plain   |             |              |
Indexes:
    "pipelines_pkey" PRIMARY KEY, btree (id, partition_id)
Referenced by:
    TABLE "jobs" CONSTRAINT "jobs_pipeline_id_partition_id_fkey" FOREIGN KEY (pipeline_id, partition_id) REFERENCES pipelines(id, partition_id) ON UPDATE CASCADE ON DELETE CASCADE

marius=# \d+ jobs
                                                         Table "public.jobs"
    Column    |  Type  | Collation | Nullable |             Default              | Storage | Compression | Stats target | Description
--------------+--------+-----------+----------+----------------------------------+---------+-------------+--------------+-------------
 id           | bigint |           | not null | nextval('jobs_id_seq'::regclass) | plain   |             |              |
 partition_id | bigint |           | not null |                                  | plain   |             |              |
 pipeline_id  | bigint |           |          |                                  | plain   |             |              |
Indexes:
    "jobs_pkey" PRIMARY KEY, btree (id, partition_id)
Foreign-key constraints:
    "jobs_pipeline_id_partition_id_fkey" FOREIGN KEY (pipeline_id, partition_id) REFERENCES pipelines(id, partition_id) ON UPDATE CASCADE ON DELETE CASCADE

In this case, updating the partition_id for pipelines cascades the update to the jobs rows:

marius=# select * from pipelines;
 id | partition_id
----+--------------
  1 |            1
  2 |            1
  3 |            1
(3 rows)

marius=# select * from jobs;
 id | partition_id | pipeline_id
----+--------------+-------------
  1 |            1 |           1
  2 |            1 |           2
  3 |            1 |           3
(3 rows)

marius=# update pipelines set partition_id = 2 where id = 1;
INFO:  Executing UPDATE trigger on pipelines
INFO:  Executing UPDATE trigger on jobs
UPDATE 1
marius=# select * from jobs;
 id | partition_id | pipeline_id
----+--------------+-------------
  2 |            1 |           2
  3 |            1 |           3
  1 |            2 |           1
(3 rows)

But if the tables are partitioned, updating the partitioning value will delete the data from the dependent tables:

Tables structure
marius=# \d+ pipelines
                                                   Partitioned table "public.pipelines"
    Column    |  Type  | Collation | Nullable |                Default                | Storage | Compression | Stats target | Description
--------------+--------+-----------+----------+---------------------------------------+---------+-------------+--------------+-------------
 id           | bigint |           | not null | nextval('pipelines_id_seq'::regclass) | plain   |             |              |
 partition_id | bigint |           | not null |                                       | plain   |             |              |
Partition key: LIST (partition_id)
Indexes:
    "pipelines_pkey" PRIMARY KEY, btree (id, partition_id)
Referenced by:
    TABLE "jobs" CONSTRAINT "jobs_pipeline_id_partition_id_fkey" FOREIGN KEY (pipeline_id, partition_id) REFERENCES pipelines(id, partition_id) ON UPDATE CASCADE ON DELETE CASCADE
Partitions: pipelines_1 FOR VALUES IN ('1'),
            pipelines_2 FOR VALUES IN ('2')

marius=# \d+ jobs
                                                   Partitioned table "public.jobs"
    Column    |  Type  | Collation | Nullable |             Default              | Storage | Compression | Stats target | Description
--------------+--------+-----------+----------+----------------------------------+---------+-------------+--------------+-------------
 id           | bigint |           | not null | nextval('jobs_id_seq'::regclass) | plain   |             |              |
 partition_id | bigint |           | not null |                                  | plain   |             |              |
 pipeline_id  | bigint |           |          |                                  | plain   |             |              |
Partition key: LIST (partition_id)
Indexes:
    "jobs_pkey" PRIMARY KEY, btree (id, partition_id)
Foreign-key constraints:
    "jobs_pipeline_id_partition_id_fkey" FOREIGN KEY (pipeline_id, partition_id) REFERENCES pipelines(id, partition_id) ON UPDATE CASCADE ON DELETE CASCADE
Partitions: jobs_1 FOR VALUES IN ('1'),
            jobs_2 FOR VALUES IN ('2')
marius=# UPDATE pipelines SET partition_id = 2 WHERE id = 1;
INFO:  Executing DELETE trigger on pipelines_1
INFO:  Executing INSERT trigger on pipelines_2
INFO:  Executing DELETE trigger on jobs_1
UPDATE 1

marius=# select * from jobs;
 id | partition_id | pipeline_id
----+--------------+-------------
  2 |            1 |           2
  3 |            1 |           3
(2 rows)

marius=# select * from pipelines;
 id | partition_id
----+--------------
  2 |            1
  3 |            1
  1 |            2
(3 rows)

Looks like Postgres deletes data from the one partition and inserts it into the new one, but the deletes from the original partition are propagated onto the dependent tables, deleting that data instead of updating it to match the new partition.

This will be a problem for partition rebalancing(move data out of ci_builds) and our solution for ensuring id uniqueness because the insert trigger will override the existing ids.

Debugging scripts

Postgres version:

docker run -it --rm -e POSTGRES_HOST_AUTH_METHOD=trust -e POSTGRES_USER=marius -p 5432:5432 postgres:14-alpine
CREATE TABLE pipelines (
    id bigserial,
    partition_id bigint,
    PRIMARY KEY(id, partition_id)
)
PARTITION BY LIST (partition_id);

CREATE TABLE jobs (
    id bigserial,
    partition_id bigint,
    pipeline_id bigint,
    PRIMARY KEY(id, partition_id),
    FOREIGN KEY (pipeline_id, partition_id) REFERENCES pipelines(id, partition_id) ON UPDATE CASCADE ON DELETE CASCADE
)
PARTITION BY LIST (partition_id);


CREATE  TABLE pipelines_1 PARTITION OF pipelines FOR VALUES IN (1);
CREATE  TABLE pipelines_2 PARTITION OF pipelines FOR VALUES IN (2);

CREATE  TABLE jobs_1 PARTITION OF jobs FOR VALUES IN (1);
CREATE  TABLE jobs_2 PARTITION OF jobs FOR VALUES IN (2);

CREATE OR REPLACE FUNCTION log_trigger() RETURNS trigger
    LANGUAGE plpgsql
    AS $$
BEGIN
RAISE INFO 'Executing % trigger on %', TG_OP, TG_RELNAME;
RETURN NULL;
END
$$;

CREATE TRIGGER log_insert
    AFTER INSERT ON jobs
    FOR EACH ROW
    EXECUTE FUNCTION log_trigger();

CREATE TRIGGER log_update
    AFTER UPDATE ON jobs
    FOR EACH ROW
    EXECUTE FUNCTION log_trigger();

CREATE TRIGGER log_delete
    AFTER DELETE ON jobs
    FOR EACH ROW
    EXECUTE FUNCTION log_trigger();

CREATE TRIGGER log_insert
    AFTER INSERT ON pipelines
    FOR EACH ROW
    EXECUTE FUNCTION log_trigger();

CREATE TRIGGER log_update
    AFTER UPDATE ON pipelines
    FOR EACH ROW
    EXECUTE FUNCTION log_trigger();

CREATE TRIGGER log_delete
    AFTER DELETE ON pipelines
    FOR EACH ROW
    EXECUTE FUNCTION log_trigger();


INSERT INTO pipelines(id, partition_id) VALUES (1, 1), (2, 1), (3, 1);
INSERT INTO jobs(pipeline_id, partition_id) VALUES (1, 1), (2, 1), (3, 1);

UPDATE pipelines SET partition_id = 2 WHERE id = 1;
Edited by 🤖 GitLab Bot 🤖