Decide where to store `scoped_user_id`

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Problem

We need to choose where to persist scoped_user_id which today it's immutable data but not a good candidate for p_ci_job_definitions because it would negatively impact deduplication - same job definition can be triggered by many different users.

We need to store scoped_user_id outside.

Proposal

Create dedicated table p_ci_job_identities that would contain scoped_user_id <-> job_id association together with partitioning and sharding keys.

Backwards compatibility

Since scoped_user_id is currently persisted in options, we need to evaluate the current behavior and what should happen when persisting this data into a dedicated table:

Current: scoped_user_id is persisted during pipeline creation.
Current: when user is deleted we maintain scoped_user_id as is. We could probably use LFK between scoped_user_id and users.id and delete the record. Behavior should remain the same
Current: when job is retried we propagate (during cloning) the scoped_user_id to the new job.

Old issue - Decide where to store `scoped_user_id`

Proposals

2 options so far:

introduce a new table p_ci_job_processing that it would mainly persist this column but be a place for other similar data in the future
1. ➖ Adding a new table will come with the overhead of standard columns: job_id, project_id, partition_id, created_at, updated_at. This would be highly inefficient for one integer column scoped_user_id.
2. ➕ If in the future we have similar type of data (processing or immutable but not good candidate for deduplication) we could store it here.
persist it in p_ci_builds. The latter could be preferable if we want to display what human user triggered the job using a service account/agent.
1. scoped_user_id is used when service accounts (e.g. Duo Workflow or AmazonQ mapped to a user_id) trigger actions on behalf of a human user (in this case tracked via scoped_user_id). Arguably this data could also be considered intrinsic.
2. ➕ scoped_user_id, while today it's used as processing data for authorization, it has the tendency to be intrinsic data. For example: like for user_id we may want to audit or display which human user (scoped_user_id) triggered the action through the service account. This information may at some point be displayed in the UI or exposed via API.
3. ➖ It's an extra column to add to ci_builds - but as of today it doesn't need to be indexed or linked via FK because it's currently stored in options.

Edited Jul 23, 2025 by 🤖 GitLab Bot 🤖

Decide where to store scoped_user_id

Problem

Proposal

Backwards compatibility

Proposals

Decide where to store `scoped_user_id`