New ids for partitioned CI resources

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Description

We are currently working on CI/CD data partitioning https://docs.gitlab.com/ee/architecture/blueprints/ci_data_decay/pipeline_partitioning.html.

In https://docs.gitlab.com/ee/architecture/blueprints/ci_data_decay/pipeline_partitioning.html#primary-key we described the reasons behind the need of changing primary key for partitioned table from id to (id, partition_id). Doing that means that we technically can have same id value in multiple partitions, for example (123, 100), (123, 120).

PostgreSQL currently doesn't have a well suited mechanism to ensure consistency and uniqueness of id across partitions.

Proposal

Introduce a new identifier scheme, in which it really doesn't matter what the id is. Use:

(pipeline_id)-(partition_id)-(resource_id) scheme using hexadecimal numbers (to save space).

For example: 1e240-64-5ba0 for pipeline 123456, build 23456 and partition id 100.

We would use such identifiers in the UI, links, anywhere, where we need to identify a partitioned resource (except of the database, because there this data is normalized into three columns).

/cc @mbobin @morefice

Edited by 🤖 GitLab Bot 🤖