Add CI job to sync AI principles from SSOT
What does this MR do and why?
Wire the gitlab-ai-principles-distiller gem (added in !235272 (merged)) to a weekly scheduled pipeline that keeps distilled agent principles in .ai/principles/distilled/*.md in sync with their docs.gitlab.com sources. When the script detects drift, it auto-creates a branch, commits the regenerated principles, and opens an MR labelled ai-agent, documentation, and type::maintenance for human review. Auto-MR target settings (branch prefix, title template, labels, remove_source_branch) live in .ai/principles/manifest.yml under the auto_mr: block.
This delivers issue https://gitlab.com/gitlab-org/gitlab/-/issues/597600. The gem and AI Catalog flow it depends on are tracked in https://gitlab.com/gitlab-org/gitlab/-/issues/599663.
Surface
-
.gitlab/ci/sync-principles.gitlab-ci.yml— new file. Addsai-principles-syncin theai-gatewaystage withneeds: [],timeout: 60m,allow_failure: true, and anartifactsblock that uploads.ai/principles/distilled/for 7 days on every run (when: always). Runs onruby:${RUBY_VERSION}-alpinevia the dependency proxy, withBUNDLE_PATH: vendorso gem deps are vendored under the gem dir. Thebefore_scriptdoes more than a barebundle install:- installs
build-base gitviaapk, - logs the calling identity (
$GITLAB_USER_LOGIN) and$CI_PIPELINE_SOURCEfor audit-trail visibility, - sets the service account git author (
Agent Principles Distiller), - does a shallow
git fetchof$CI_DEFAULT_BRANCHsoGitlab::PrinciplesDistiller::Sync'sgit merge-base HEAD master(thedistillation_base_sha) can resolve on the default depth-20 clone, - pre-installs the exact Bundler version pinned in
Gemfile.lockto avoid the alpine image's older Bundler self-upgrading mid-install (which prints a confusing "Cannot write a changed lockfile while frozen" stderr line).
The
scriptthen runs, fromgems/gitlab-ai-principles-distiller/:bundle exec bin/gitlab-ai-principles-distiller-provision-flow --workspace "$CI_PROJECT_DIR"(mirrors any prompt/tool changes from.ai/principles/distillation_prompt.mdinto the AI Catalog flow), thenbundle exec bin/gitlab-ai-principles-distiller-sync --workspace "$CI_PROJECT_DIR" --push(detects drift, triggers Duo Workflows, writes back, and opens the auto-MR).
Authenticates via the
AGENT_PRINCIPLES_SERVICE_ACCOUNT_TOKENproject CI variable, surfaced as bothGITLAB_TOKEN(Workflow API + GraphQL) andGITLAB_API_TOKEN(auto-MR REST). - installs
-
.gitlab/ci/rules.gitlab-ci.yml—.ai-principles-sync:rules:weeklyreuses&if-default-branch-schedule-weekly, pinned togitlab-org/gitlaband gated on theAGENT_PRINCIPLES_SERVICE_ACCOUNT_TOKENvariable being present. The job triggers only on the weekly default-branch schedule. -
.ai/principles/manifest.yml— adds a newauthenticationprinciple under theSecuritygroup, covering authentication, authorization, and composite identity for Duo agents. File filters target controller, service, and auth library paths in both FOSS and EE (app/controllers/**/*.rb,app/services/**/*.rb,lib/gitlab/auth/**/*.rb,lib/api/helpers/**/*.rb, and theiree/counterparts). Sources:doc/development/authentication.mdanddoc/development/ai_features/composite_identity.md. This will be picked up by the first run of the sync job to produce a new.ai/principles/distilled/authentication.md.
Git push authentication
The auto-MR git push authenticates the service account PAT via an http.<host>.extraHeader injected through GIT_CONFIG_* env vars, so the token never lands in argv, the remote URL, or git's reflog. Two subtleties (each previously caused a failed push, caught during end-to-end validation):
- Header scope must be the host (
http.https://gitlab.com.extraHeader), not the repo URL. git matcheshttp.<url>.*by URL prefix on whole path segments, so a key scoped to.../gitlabdoes not match the request to.../gitlab.gitand would be dropped (push falls back to anonymous →403). - Auth scheme must be HTTP Basic (
Authorization: Basic base64("oauth2:<token>")), notBearer: GitLab's smart-HTTP git endpoint authenticates PATs via Basic auth with a non-empty username. The REST calls (find/create/update MR) correctly usePRIVATE-TOKEN.
Required CI variables
| Variable | Purpose |
|---|---|
AGENT_PRINCIPLES_SERVICE_ACCOUNT_TOKEN |
Classic PAT (api scope; fine-grained PATs are not supported because they do not cover GraphQL, AI Catalog mutations, or the Duo Workflow create endpoint) used as both GITLAB_TOKEN (Workflow API + GraphQL) and GITLAB_API_TOKEN (auto-MR REST). Belongs to the dedicated service account (see Authentication). Set protected + masked. |
AGENT_PRINCIPLES_CATALOG_ITEM_CONSUMER_ID |
Numeric ID of the ItemConsumer that binds the catalog Flow to gitlab-org/gitlab. Provisioned once via bin/gitlab-ai-principles-distiller-provision-flow and printed at the end of that script's output. |
AGENT_PRINCIPLES_CATALOG_PROJECT (the catalog project path) is set inline in the YAML (gitlab-org/gitlab) rather than as a project CI variable, so the binding is reviewable in the YAML diff.
The catalog Flow has been provisioned in production (Flow ID gid://gitlab/Ai::Catalog::Item/1009160, Consumer ID 7368818).
Authentication
The Duo Agent Platform Workflow API requires the calling identity to have a Duo Agent Platform seat. Project access tokens (such as PROJECT_TOKEN_FOR_CI_SCRIPTS_API_USAGE) are bound to bot users that do not hold seats, so they cannot drive this job — confirmed by an earlier CI run that returned 400 Bad request - ["forbidden to access duo workflow"].
This is now served by a dedicated service account, service-modelops-agent-principles-distiller, provisioned via access request https://gitlab.com/gitlab-com/team-member-epics/access-requests/-/issues/43931 (confidential). The account holds Developer on gitlab-org/gitlab and a Duo Agent Platform seat. Its api-scope PAT is stored in AGENT_PRINCIPLES_SERVICE_ACCOUNT_TOKEN (protected + masked, expires 2027-06-05). End-to-end validation against this account is green: the job distilled all 7 principles via the Workflow API and pushed a branch / opened an MR.
allow_failure: true is intentional: a failed scheduled run (e.g. revoked PAT, transient Duo Workflow error) should surface in the schedule UI without blocking other scheduled work.
End-to-end testing
The production rules (schedule source + weekly type + default branch) make the job impossible to run directly from an MR pipeline. End-to-end validation was performed by temporarily exposing the job to this MR's pipeline (a merge_request_event-gated manual rule, since removed) and playing it. The validating run distilled all 7 principles via the Workflow API on the service-account seat, pushed the branch, and created an auto-MR — confirming push auth, REST MR creation, and the full flow.
This MR is based on current master so the merged-result pipeline (refs/merge-requests/235014/merge) contains the gem files (the gem landed in !235272 (merged)).
Pre-merge cleanup
- Remove the temporary
merge_request_eventrule branch in.gitlab/ci/rules.gitlab-ci.yml. - Flip
AGENT_PRINCIPLES_SERVICE_ACCOUNT_TOKENtoprotected: truein Settings → CI/CD → Variables. With the temp rule gone, the only consumer is the scheduled pipeline onmaster(a protected branch). - Set an expiry on the service-account PAT (2027-06-05).
- Add an identity-logging line to the
before_scriptso each job run prints the calling identity. Landed viaecho "Identity:${GITLAB_USER_LOGIN:-<unset>} (job triggered from $CI_PIPELINE_SOURCE)". - Confirm the gem MR (!235272 (merged)) has merged and this MR targets
master. - Un-Draft this MR.
Post-merge follow-up
The job rides the existing weekly pipeline schedule — no new schedule is required. Creating a second master + SCHEDULE_TYPE=weekly schedule would double-run every weekly job.
- The job is triggered by the existing schedule
2835726([Weekly] Elasticsearch 9, OpenSearch latest, Valkey, PG17 testing;master;0 10 * * 2Tuesdays 10:00 UTC;SCHEDULE_TYPE=weekly; ownergitlab-bot). After merge, trigger that schedule once (or wait for the next run) and confirmai-principles-syncappears, authenticates as the service account (check theIdentity:log line), and is not skipped by the$AGENT_PRINCIPLES_SERVICE_ACCOUNT_TOKENguard. - The schedule owner (
gitlab-bot) has Maintainer (access_level 40) ongitlab-org/gitlab, so it can read the protectedAGENT_PRINCIPLES_SERVICE_ACCOUNT_TOKEN. If the job is ever skipped, schedule-owner↔️ protected-variable access is the first thing to check. - Revisit
allow_failure: trueon the job — now that the service-account path is validated, a schedule failure could be made blocking.
Related
- Issue: https://gitlab.com/gitlab-org/gitlab/-/issues/597600
- Predecessor (gem + flow): !235272 (merged) (target https://gitlab.com/gitlab-org/gitlab/-/issues/599663)
- Service account access request (confidential): https://gitlab.com/gitlab-com/team-member-epics/access-requests/-/issues/43931
- Earlier distillation pipeline: https://gitlab.com/gitlab-org/gitlab/-/issues/597599
- Future component extraction: https://gitlab.com/gitlab-org/gitlab/-/issues/599498
- Parent epic: gitlab-org&21742
Checklist
Pre-merge
Consider the effect of the changes in this merge request on the following:
- Different pipeline types — job is gated to
scheduleonly. - Non-canonical projects:
-
gitlab-foss— N/A, gated to$CI_PROJECT_PATH == "gitlab-org/gitlab". -
security— N/A. -
dev— N/A. - personal forks — N/A.
-
- Pipeline performance — schedule-only, does not affect MR or default-branch pipelines.
- CI variable
AGENT_PRINCIPLES_SERVICE_ACCOUNT_TOKENprovisioned in project settings (service account,protected+masked; see access request https://gitlab.com/gitlab-com/team-member-epics/access-requests/-/issues/43931).
If new jobs are added:
- Change-related rules: pipeline-schedule-only (
SCHEDULE_TYPE=weekly), no MR/main-branch impact. - Frequency: weekly, against
masteronly. - N/A: not added to merge request pipelines (schedule-only).
Post-merge
- Consider communicating these changes to the broader team following the communication guideline for pipeline changes.
- Verify
ai-principles-syncruns on the existing weekly schedule (2835726); no new schedule required.