Decide where to store scoped_user_id
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Problem
We need to choose where to persist scoped_user_id which today it's immutable data but not a good candidate for p_ci_job_definitions because it would negatively impact deduplication - same job definition can be triggered by many different users.
We need to store scoped_user_id outside.
Proposal
Create dedicated table p_ci_job_identities that would contain scoped_user_id <-> job_id association together with partitioning and sharding keys.
Backwards compatibility
Since scoped_user_id is currently persisted in options, we need to evaluate the current behavior and what should happen when persisting this data into a dedicated table:
- Current:
scoped_user_idis persisted during pipeline creation. - Current: when user is deleted we maintain
scoped_user_idas is. We could probably use LFK betweenscoped_user_idandusers.idand delete the record. Behavior should remain the same - Current: when job is retried we propagate (during cloning) the
scoped_user_idto the new job.
Old issue - Decide where to store `scoped_user_id`
Proposals
2 options so far:
- introduce a new table
p_ci_job_processingthat it would mainly persist this column but be a place for other similar data in the future-
➖ Adding a new table will come with the overhead of standard columns:job_id,project_id,partition_id,created_at,updated_at. This would be highly inefficient for one integer columnscoped_user_id. -
➕ If in the future we have similar type of data (processing or immutable but not good candidate for deduplication) we could store it here.
-
- persist it in
p_ci_builds. The latter could be preferable if we want to display what human user triggered the job using a service account/agent.-
scoped_user_idis used when service accounts (e.g. Duo Workflow or AmazonQ mapped to auser_id) trigger actions on behalf of a human user (in this case tracked viascoped_user_id). Arguably this data could also be considered intrinsic. -
➕ scoped_user_id, while today it's used as processing data for authorization, it has the tendency to be intrinsic data. For example: like foruser_idwe may want to audit or display which human user (scoped_user_id) triggered the action through the service account. This information may at some point be displayed in the UI or exposed via API. -
➖ It's an extra column to add toci_builds- but as of today it doesn't need to be indexed or linked via FK because it's currently stored inoptions.
-
Edited by 🤖 GitLab Bot 🤖