feat(orbit): add glab orbit subcommands (experimental)

What

Adds the glab orbit remote command family for the GitLab Knowledge Graph (Orbit) REST endpoints, with glab orbit r ... as a shorter alias. The bare glab orbit is a brief overview that points at remote (and the future local), and glab orbit remote walks through the discovery dance (status → schema → tools → query) so AI agents know where to start. All commands are tagged Experimental + mcp:safe.

Subcommand Endpoint Purpose
glab orbit remote status GET orbit/status Cluster health
glab orbit remote schema [node...] GET orbit/schema Graph ontology; positional args become expand=
glab orbit remote tools GET orbit/tools MCP tool manifest with the DSL JSON Schema
glab orbit remote query [file|-] POST orbit/query Run a query from file or stdin; --format raw|llm
glab orbit remote graph-status GET orbit/graph_status Indexing progress; one of --namespace-id / --project-id / --full-path

Why

Resolves the CLI portion of gitlab#591015. The Orbit skill currently routes through glab api orbit/* calls; this MR makes Orbit a first-class glab surface with typed responses, structured error mapping, and properly-scoped exit codes for scripting agents.

Bug fixes

  • Details silently dropped on errors. orbiterr.Translate stored troubleshooting text in cmdutils.ExitError.Details, but Fang's DefaultErrorHandler only renders err.Error(). Users only saw the headline. Now baked into the error message itself, so Fang prints the full guidance (e.g. Run \glab auth status``) to stderr.
  • --format default overrode body's response_format. Cobra's flag defaulted to "llm", making "explicitly passed --format llm" indistinguishable from "no flag". A body that set "response_format": "raw" was silently overridden. Now the flag defaults to ""; resolver in buildRequest lets the body win when the flag is unset (with llm as final fallback).
Design notes (exit codes, format semantics, restructure)

Restructure under glab orbit remote

Reviewer ask. The four original subcommands moved from internal/commands/orbit/{status,schema,tools,query}/ to internal/commands/orbit/remote/...; a new remote.go parent owns the discovery-dance help text, registers the API subcommands, and carries the r alias.

Typed client usage

Per @phikai's ask, the CLI consumes the typed OrbitService from gitlab-org/api/client-go, not raw glab api calls. The companion MR is gitlab-org/api/client-go!2870 (now merged; go.mod pinned to a pseudo-version).

Exit-code taxonomy

A shared internal/commands/orbit/internal/orbiterr helper translates HTTP errors to typed *cmdutils.ExitErrors with stable exit codes:

Status Exit Meaning
404 2 (ExitOrbitUnavailable) knowledge_graph flag off, or path typo
401 3 (ExitUnauthenticated) Missing/expired token
403 4 (ExitForbidden) No Knowledge Graph enabled namespaces
429 5 (ExitRateLimited) Inspect Retry-After and back off
503 (graph-status only) 1 GKG service unavailable; descriptive message
other non-2xx 1 Unstructured error (response body, if any, included)

Codes 1 and 2 are already used by other glab commands. Codes 3–5 are new and Orbit-specific. They are advertised in the glab orbit remote long help so agents can branch on them without parsing stderr.

The error phrasings mirror the Orbit skill's references/troubleshooting.md.

--format raw|llm semantics

The CLI's --format flag maps to the body's response_format field. Empty string = "no flag passed" → body's response_format wins, falling back to llm. Flag overrides body when present. graph-status defaults server-side to raw.

Test coverage

make test (3072 tests, 0 failures) and make lint (0 issues) green locally.

  • internal/commands/orbit/orbit_test.go — asserts only remote is registered directly on orbit.
  • internal/commands/orbit/remote/remote_test.go (new) — asserts all five API subcommands and the r alias.
  • internal/commands/orbit/internal/orbiterr/orbiterr_test.go — regression: every mapped status code asserts the troubleshooting text is part of err.Error().
  • internal/commands/orbit/remote/query/query_test.goTestQuery_BodyFormatHonoredWhenNoFlag covers "body says raw, no flag → request gets raw".
  • internal/commands/orbit/remote/graphstatus/graphstatus_test.go (new) — happy paths for all three scope flags, exactly-one-of validation, invalid --format, 404 → exit 2, 503 → exit 1 with descriptive message.

The existing per-command tests for status, schema, tools, query were carried over with their package paths updated.

E2E validation

Validated end-to-end against a fresh local GDK + a self-built gkg-server (feat/orbit-commands @ d192eec, client-go pseudo-version v2.21.1-0.20260429092219-2f1e1303c48e).

Local GDK + GKG (localhost:3000 + GKG :8090/:50054)

Command Result
orbit remote status exit 0, healthy components from local GKG webserver/indexer/clickhouse
orbit remote schema exit 0, full ontology (6 domains, 26 nodes, 61 edges)
orbit remote schema MergeRequest Project exit 0, both nodes expanded with properties + edges
orbit remote tools exit 0, MCP tool manifest
orbit remote query <file> exit 0, structured result
orbit remote query - (stdin) exit 0, structured result
orbit remote graph-status --namespace-id 24 exit 0, indexing payload (state: not_indexed)
orbit remote graph-status --full-path gitlab-org exit 0, indexing payload
orbit remote graph-status --project-id 2 exit 0, indexing payload
orbit remote graph-status … --format llm exit 0, compact {}
orbit remote graph-status --namespace-id 999999 404 → exit 2 (nonexistent namespace)
GITLAB_TOKEN=invalid … status exit 3
… --hostname nonexistent.example.com exit 1 (DNS error before HTTP — correct, not 2)
… graph-status (no scope flag) exit 1 (validation: "exactly one of …")
… query --format yaml - exit 1 (invalid format value)

gitlab.com

Command Result
orbit remote status exit 0
orbit remote schema exit 0
orbit remote schema MergeRequest Project exit 0
orbit remote tools exit 0
orbit remote graph-status --full-path gitlab-org/gitlab ⚠️ exit 2 — endpoint merged in !231381 but not yet deployed to gitlab.com (CLI handles 404 cleanly)
orbit remote query exit 0 with an api-scoped token; ⚠️ exit 1 (HTTP 502) with a read_api-scoped token — known upstream Rails/Grape quirk where POST from read_api tokens is rejected even though the endpoint only requires read_knowledge_graph. Fix: gitlab!233790 (merged) (internal redaction endpoint missing allow_access_with_scope :read_api). CLI surfaces the body cleanly in both cases.

Bug fix verification (live)

  1. Details rendered. GITLAB_TOKEN=invalid glab orbit remote status prints both Not authenticated (headline) and Run \glab auth status`…` (troubleshooting body) on stderr; exit 3.

  2. --format respects body. Verified end-to-end via Rails api_json.log. All four scenarios behave correctly:

    Body response_format --format flag Sent to API
    raw (none) raw (body wins)
    raw llm llm (flag overrides)
    (none) (none) llm (default)
    (none) raw raw (flag wins)

Full report (with raw outputs and local-env build notes): https://gitlab.com/dgruzd/droid-workspace/-/tree/main/task/2511-e2e/

Agentic E2E (no Orbit skill loaded)

An AI agent (Claude Opus 4.7 via opencode run) was given natural-language prompts and asked to use glab orbit to answer real Knowledge Graph questions — with no Orbit-specific skill loaded, only the CLI's built-in --help text. All 5 tests pass:

Test Prompt Result
Discovery + status "check if KG service is healthy" Found remote status via --help, parsed JSON into a clean component table
Schema exploration "what properties on MergeRequest and Project?" Multi-stage discovery, extracted all 37 + 11 properties with enum values
Tool manifest "list available KG tools" Parsed full DSL JSON Schema from remote tools output
Query construction "query projects under gitlab-org" Built valid traversal query with starts_with filter; surfaced HTTP 502 cleanly (root cause: read_api-scoped token, see above — the agent's query itself is well-formed)
Error handling "check graph status for gitlab-org/gitlab" Found graph-status, ran it, mapped exit code 2 to the knowledge_graph flag

The error-handling test originally stalled because graph-status was only documented at glab orbit remote --help, not at the top-level glab orbit --help. Adding it to the top-level examples (this MR) fixed the discovery gap; the same agent then found the subcommand on first read of the help text and produced a textbook error analysis.

Full report: https://gitlab.com/dgruzd/droid-workspace/-/tree/main/task/2511-agentic/

Agentic query building (no Orbit skill, local GDK)

A second agentic validation round tested whether agents can build correct, non-trivial Orbit queries using only the CLI's discovery surface — no Orbit skill, no docs URL, no examples beyond --help. Five query types were tested against a local GDK with a running gkg-server:

# Query type First-shot? Result
1 traversal (multi-node, AUTHORED edge) yes 2 MRs by root
2 aggregation (count by state) ⚠️ self-recovered opened=37, merged=39, closed=0
3 neighbors (all directions) yes 48 neighbors of project id=2
4 traversal + order_by + limit ⚠️ self-recovered 10 newest MRs DESC
5 path_finding (shortest) ⚠️ self-recovered 1-hop path via CREATOR edge

Every agent independently followed the same discovery workflow: --helptools (DSL schema) → schema [Node] (ontology) → build JSON → query <file>.

The ⚠️ cases hit the runtime error "traversal/aggregation queries require node_ids or filters on at least one node" — not documented in the DSL schema, but all agents self-recovered by adding a filter. This is the only recurring papercut; recommend surfacing it in help text or DSL schema in a follow-up.

Full report: https://gitlab.com/dgruzd/droid-workspace/-/tree/main/task/2511-query-validation/

References

Out of scope (per parent issue)

  • glab orbit setup (#8286), glab orbit mcp, glab orbit local subtree, high-level query verbs / local binary management (#8240), skill rewrite to use glab orbit remote — all separate follow-ups.

Things to do before marking ready

  • Wait for client-go!2870 to merge.
  • Replace the local replace directive with a pseudo-version pin.
  • Run make test and make lint green.
  • Re-run live E2E for graph-status against local GDK + GKG.
  • Run agentic E2E (no Orbit skill loaded) against gitlab.com — all 5 tests pass.
  • Run agentic query building validation (5 query types) against local GDK — all 5 pass.
  • Re-run live E2E for graph-status against gitlab.com once the endpoint is deployed there.

Technical Writing checklist

Per guidance from TW:

  • Generated docs follow the CLI style guide
  • TW review requested from @brendan777 or #docs Slack
  • New Markdown pages added to docs site navigation (separate MR in gitlab-org/technical-writing/docs-gitlab-com)
Edited by Dmitry Gruzd

Merge request reports

Loading