Add CI job to sync AI principles from SSOT
What does this MR do and why?
Wire the gitlab-ai-principles-distiller gem (added in !235272 (merged)) to a weekly scheduled pipeline that keeps distilled agent principles in .ai/principles/distilled/*.md in sync with their docs.gitlab.com sources. When the script detects drift, it auto-creates a branch, commits the regenerated principles, and opens an MR labelled ai-agent, documentation, and type::maintenance for human review. Auto-MR target settings (branch prefix, title template, labels, remove_source_branch) live in .ai/principles/manifest.yml under the auto_mr: block.
This delivers issue https://gitlab.com/gitlab-org/gitlab/-/issues/597600. The gem and AI Catalog flow it depends on are tracked in https://gitlab.com/gitlab-org/gitlab/-/issues/599663.
Surface
-
.gitlab/ci/sync-principles.gitlab-ci.yml— new file. Addsai-principles-syncin theai-gatewaystage withneeds: [],timeout: 60m,allow_failure: true, and anartifactsblock that uploads.ai/principles/distilled/for 7 days on every run (when: always). Runs onruby:${RUBY_VERSION}-alpinevia the dependency proxy, withBUNDLE_PATH: vendorso gem deps are vendored under the gem dir. Thebefore_scriptdoes more than a barebundle install:- installs
build-base gitviaapk, - logs the calling identity (
$GITLAB_USER_LOGIN) and$CI_PIPELINE_SOURCEfor audit-trail visibility while the placeholder PAT is in use, - sets a
gitlab-bot@gitlab.comgit author, - does a shallow
git fetchof$CI_DEFAULT_BRANCHsoGitlab::PrinciplesDistiller::Sync'sgit merge-base HEAD master(thedistillation_base_sha) can resolve on the default depth-20 clone, - pre-installs the exact Bundler version pinned in
Gemfile.lockto avoid the alpine image's older Bundler self-upgrading mid-install (which prints a confusing "Cannot write a changed lockfile while frozen" stderr line).
The
scriptthen runs, fromgems/gitlab-ai-principles-distiller/:bundle exec bin/gitlab-ai-principles-distiller-provision-flow --workspace "$CI_PROJECT_DIR"(mirrors any prompt/tool changes from.ai/principles/distillation_prompt.mdinto the AI Catalog flow), thenbundle exec bin/gitlab-ai-principles-distiller-sync --workspace "$CI_PROJECT_DIR" --push(detects drift, triggers Duo Workflows, writes back, and opens the auto-MR).
Authenticates via the
AGENT_PRINCIPLES_SERVICE_ACCOUNT_TOKENproject CI variable, surfaced as bothGITLAB_TOKEN(Workflow API + GraphQL) andGITLAB_API_TOKEN(auto-MR REST). - installs
-
.gitlab/ci/rules.gitlab-ci.yml—.ai-principles-sync:rules:weeklyreuses&if-default-branch-schedule-weekly, pinned togitlab-org/gitlaband gated on theAGENT_PRINCIPLES_SERVICE_ACCOUNT_TOKENvariable being present. Includes a temporarymerge_request_event-gatedwhen: manualrule branch (annotated with# TODO: REMOVE BEFORE MERGE) so the job can be exercised from this MR's pipeline. -
.ai/principles/manifest.yml— adds a newauthenticationprinciple under theSecuritygroup, covering authentication, authorization, and composite identity for Duo agents. File filters target controller, service, and auth library paths in both FOSS and EE (app/controllers/**/*.rb,app/services/**/*.rb,lib/gitlab/auth/**/*.rb,lib/api/helpers/**/*.rb, and theiree/counterparts). Sources:doc/development/authentication.mdanddoc/development/ai_features/composite_identity.md. This will be picked up by the first run of the sync job to produce a new.ai/principles/distilled/authentication.md.
Required CI variables
| Variable | Purpose |
|---|---|
AGENT_PRINCIPLES_SERVICE_ACCOUNT_TOKEN |
Classic PAT (api scope; fine-grained PATs are not supported because they do not cover GraphQL, AI Catalog mutations, or the Duo Workflow create endpoint) used as both GITLAB_TOKEN (Workflow API + GraphQL) and GITLAB_API_TOKEN (auto-MR REST). Currently a maintainer's personal token; see Known limitations. |
AGENT_PRINCIPLES_CATALOG_ITEM_CONSUMER_ID |
Numeric ID of the ItemConsumer that binds the catalog Flow to gitlab-org/gitlab. Provisioned once via bin/gitlab-ai-principles-distiller-provision-flow and printed at the end of that script's output. |
AGENT_PRINCIPLES_CATALOG_PROJECT (the catalog project path) is set inline in the YAML (gitlab-org/gitlab) rather than as a project CI variable, so the binding is reviewable in the YAML diff.
The catalog Flow has been provisioned in production (Flow ID gid://gitlab/Ai::Catalog::Item/1009160, Consumer ID 7368818).
Known limitations
The Duo Agent Platform Workflow API requires the calling identity to have a Duo Agent Platform seat. Project access tokens (such as PROJECT_TOKEN_FOR_CI_SCRIPTS_API_USAGE) are bound to bot users that do not hold seats, so they cannot drive this job — confirmed by an earlier CI run that returned 400 Bad request - ["forbidden to access duo workflow"] (per @mike.wronski's prior analysis and @surabhi.suman / @aakgun's clarification in #agentic-engineering-discussions).
The supported sustainable pattern is a service account auto-provisioned when an AI Catalog flow's ItemConsumer is created at the group level. Provisioning a group-level consumer for gitlab-org requires Owner access on that group, which the maintainer of this MR does not have.
A service account access request has been filed at https://gitlab.com/gitlab-com/team-member-epics/access-requests/-/issues/43931 (confidential). The request explicitly flags the Duo Agent Platform seat requirement so the provisioner can confirm which route (vanilla service account with a seat attached, or group-level catalog ItemConsumer with auto-provisioned identity) is appropriate.
For now, AGENT_PRINCIPLES_SERVICE_ACCOUNT_TOKEN holds a maintainer's personal access token as a temporary placeholder. Migration to the provisioned service account is tracked in the access request above and as a post-merge follow-up (see Post-merge follow-up).
allow_failure: true is intentional: while the placeholder PAT is in use and the temporary MR-event manual trigger exists, we want a failed run (e.g. revoked PAT, transient Duo Workflow error) to surface in the schedule UI without blocking other scheduled work. Once migrated to the service account this should be revisited.
Pre-merge testing
The production rules (schedule source + weekly type + default branch) make the job impossible to run from an MR pipeline alone. The temporary merge_request_event-gated rule lets the job be exercised from this MR's pipeline.
This MR is rebased on top of the gem MR (!235272 (merged))'s current HEAD so that the merged-result pipeline (refs/merge-requests/235014/merge) contains the gem files. Without that rebase, an MR conflict forced GitLab to fall back to a head-only pipeline, in which gems/gitlab-ai-principles-distiller/ did not exist.
Test plan:
- Provision the project CI variable
AGENT_PRINCIPLES_SERVICE_ACCOUNT_TOKENin Settings → CI/CD → Variables with a maintainer's classic PAT (apiscope). - Click the play button on the
ai-principles-syncjob in this MR's pipeline. - If an auto-MR is created, inspect labels (
ai-agent,documentation,type::maintenance), target branch, commit message, and distilled output (including the newdistilled/authentication.md).
Final pre-merge cleanup
- Remove the temporary
merge_request_eventrule branch in.gitlab/ci/rules.gitlab-ci.yml. - Flip
AGENT_PRINCIPLES_SERVICE_ACCOUNT_TOKENtoprotected: truein Settings → CI/CD → Variables. With the temp rule gone, the only consumer is the scheduled pipeline onmaster(a protected branch). - Set a short expiry (≤30 days) on the placeholder PAT so this temporary state cannot drift indefinitely while waiting for service-account provisioning.
- Add an identity-logging line to the
before_scriptso each job run prints the calling identity. Landed viaecho "Identity:${GITLAB_USER_LOGIN:-<unset>} (job triggered from $CI_PIPELINE_SOURCE)". - Confirm the gem MR (!235272 (merged)) has merged and this MR's target branch has been retargeted to
master. - Un-Draft this MR.
Post-merge follow-up
- Manually create the pipeline schedule in the GitLab UI: cron
0 6 * * 1(Monday 06:00 UTC), variableSCHEDULE_TYPE=weekly. - Trigger the schedule once to verify end-to-end works against
master. - Migrate
AGENT_PRINCIPLES_SERVICE_ACCOUNT_TOKENfrom the placeholder personal PAT to the dedicated service account token once the access request at https://gitlab.com/gitlab-com/team-member-epics/access-requests/-/issues/43931 is provisioned. The auth options (vanilla service account with attached Duo seat vs group-level catalogItemConsumerwith auto-provisioned identity) are documented in that issue's "Other Access" section. - Revisit
allow_failure: trueon the job once the placeholder PAT is gone — a schedule failure on the service-account path probably should block.
Related
- Issue: https://gitlab.com/gitlab-org/gitlab/-/issues/597600
- Predecessor (gem + flow): !235272 (merged) (target https://gitlab.com/gitlab-org/gitlab/-/issues/599663)
- Service account access request (confidential): https://gitlab.com/gitlab-com/team-member-epics/access-requests/-/issues/43931
- Earlier distillation pipeline: https://gitlab.com/gitlab-org/gitlab/-/issues/597599
- Future component extraction: https://gitlab.com/gitlab-org/gitlab/-/issues/599498
- Parent epic: gitlab-org&21742
Checklist
Pre-merge
Consider the effect of the changes in this merge request on the following:
- Different pipeline types — job is gated to
scheduleonly. - Non-canonical projects:
-
gitlab-foss— N/A, gated to$CI_PROJECT_PATH == "gitlab-org/gitlab". -
security— N/A. -
dev— N/A. - personal forks — N/A.
-
- Pipeline performance.
- CI variable
AGENT_PRINCIPLES_SERVICE_ACCOUNT_TOKENprovisioned in project settings (placeholder PAT for now; service-account migration tracked in access request https://gitlab.com/gitlab-com/team-member-epics/access-requests/-/issues/43931).
If new jobs are added:
- Change-related rules: pipeline-schedule-only (
SCHEDULE_TYPE=weekly), no MR/main-branch impact. - Frequency: weekly, against
masteronly. - N/A: not added to merge request pipelines (schedule-only).
Post-merge
- Consider communicating these changes to the broader team following the communication guideline for pipeline changes.
- Create the weekly pipeline schedule in the UI (see description).
- Migrate
AGENT_PRINCIPLES_SERVICE_ACCOUNT_TOKENfrom the placeholder PAT to the provisioned service account token (see Post-merge follow-up and access request https://gitlab.com/gitlab-com/team-member-epics/access-requests/-/issues/43931).