docs(adr): add ADR 012 for GOON format design (4cff5f15) · Commits · GitLab.org / orbit / GitLab Knowledge Graph

docs/design-documents/decisions/004_unified_response_schema.md

+79 −79

Original line number	Diff line number	Diff line
		@@ -338,7 +338,7 @@ Every response includes a `format_version` field (semver string, e.g. `"1.0.0"`)

		The `ResultFormatter` trait exposes `format_name() -> FormatName` and `format_version() -> Option<&Version>` so the gRPC service stamps version metadata without hardcoding. A stub formatter (like `GoonFormatter` today, which delegates to `GraphFormatter`) returns `None` from `format_version()` — the proto field then carries an empty string, making "stub" observable in telemetry. CI enforces that changes to formatter code or the response schema require a strictly greater semver bump (`scripts/check-response-schema-version.sh`).

		GOON format versioning (`config/GOON_OUTPUT_FORMAT_VERSION`) will be added in a follow-up MR alongside the actual GOON encoding (ADR 009).
		GOON format versioning (`config/GOON_OUTPUT_FORMAT_VERSION`) is added alongside the actual GOON encoding (ADR 012).

		## Consequences

docs/design-documents/decisions/012_goon_format.md

0 → 100644

+312 −0

Original line number	Diff line number	Diff line
		---
		title: "GKG ADR 012: GOON Format (Graph Object Output Notation)"
		creation-date: "2026-04-15"
		last-updated: "2026-05-10"
		authors: [ "@michaelangeloio", "@jgdoyon1" ]
		toc_hide: true
		---

		## Status

		Accepted

		## Date

		2026-05-10

		## Context

		The GKG server returns graph query results through `ResponseFormat`: `RAW` produces structured JSON from `GraphFormatter`; `LLM` produces text from `GoonFormatter`. The `LLM` path existed in proto, gRPC routing, and CLI wiring before any encoding shipped — `GoonFormatter` delegated to `GraphFormatter` and returned the same JSON. No LLM-optimized encoding existed.

		When an agent calls `query_graph` and receives the full `GraphResponse` JSON, a traversal of 50 users and 200 merge requests with 200 edges runs roughly 12,000–15,000 tokens. The same information in a columnar text format fits in 5,000–7,000 tokens while preserving the graph topology the agent needs. Over a multi-turn session of 5–10 graph queries, that is 50,000–80,000 tokens of context spent on structural JSON syntax.

		Encoding choice also affects correctness. Google's "Talk like a Graph" study (Fatemi et al., ICLR 2024) measured graph reasoning accuracy as a function of encoding alone and observed swings between 4.8% and 61.8% per task. Incident encoding (grouping edges by source) outperformed flat edge lists and adjacency matrices across most tasks. Dense encodings with many edges acted as distractors. Application-context framing ("who authored which merge requests") outperformed abstract graph framing by up to 18 percentage points.

		The encoding choice is a correctness concern as much as a token-budget one.

		## Decision

		Adopt GOON, a line-oriented text format for `format=llm` responses. Section-marker grammar with deterministic ordering, type-grouped node bodies, relationship-grouped edges, table-shaped aggregation rows, and chain notation for paths.

		The format spec was validated by:

		- A 5-variant Pareto benchmark on Haiku 4.5 against the production GitLab.com graph (`gitlab-org/orbit/gkg-evals-harness`). 430 task-runs over two cohorts measured cost, duration, and tool-call correctness for `kv`, `col`, `hier`, `min`, `incident`, and the raw JSON baseline.
		- A corpus audit pass running the full `fixtures/queries/corpus-input.json` against production via four parallel sub-agents, surfacing six production-confirmed encoder gaps (each fixed and regression-tested).
		- A post-merge data-loss audit using two parallel sub-agents to walk every field of `GraphResponse` and verify the encoder reads it. Four silent drops found and fixed.

		The `kv` variant was Pareto-dominant over raw JSON: −11% cost, −15% duration, +4.8pp correctness, p=0.043 on `tool_sequence_length`. The `min` variant matched `kv` on accuracy at lower token cost, which led to the `@hints` block being descoped from the format — its presence in `kv` did not improve agent behavior over `min`'s absence of it.

		## Format specification

		GOON is line-oriented text. Sections are delimited by `@`-prefixed markers, emitted in a fixed order: `@header`, `@nodes`, then exactly one of `@edges` (for `traversal`, `search`, `neighbors`) or `@paths` (for `path_finding`), and `@rows` for `aggregation`. Empty sections still emit their marker so a parser does not have to special-case absence.

		### `@header`

		Query metadata. One field per line. Always present.

		```plaintext
		@header
		query_type:traversal
		goon_version:1.0.0
		nodes:10
		edges:5
		```

		Fields:

		\| Field \| When \| Meaning \|
		\|---\|---\|---\|
		\| `query_type` \| always \| One of `traversal`, `aggregation`, `path_finding`, `search`, `neighbors`. \|
		\| `goon_version` \| always \| This encoder's wire version (semver). Distinct from the upstream `GraphResponse.format_version`, which is the source schema version owned by ADR 004. \|
		\| `nodes` / `edges` \| always \| Counts of entries in `@nodes` / `@edges`. For aggregation with node-kind group columns, `nodes` reflects the deduplicated entities lifted from rows. \|
		\| `rows` \| aggregation \| Number of rows in `@rows`. \|
		\| `group_by` \| aggregation with `group_by` \| Comma-separated descriptors: `name(kind)` or `name(kind:detail)`. Kind is `node` (detail is the entity type, e.g. `severity(node:Vulnerability)`) or `property` (detail is the underlying ontology property when the alias differs, e.g. `severity_bucket(property:severity)`). \|
		\| `aggregations` \| aggregation with metrics \| Comma-separated descriptors: `name(function)`, `name(function:target)`, or `name(function:target.property)`. The richest form names both the node alias and the property being aggregated, so `latest_update(max:v.updated_at)` is unambiguous. \|
		\| `has_more` \| when cursor paginated and there are more rows \| `true`. \|
		\| `total_rows` \| when cursor paginated \| Authorized row count before cursor slicing. \|

		### `@nodes`

		Nodes grouped by entity type. Each type starts with `TypeName(count):` then one entity per line. Each line begins with the integer ID, followed by `key=value` pairs.

		```plaintext
		@nodes
		User(3):
		1 username=alice name="Alice Smith" state=active
		2 username=bob name="Bob Chen" state=active
		3 username=carol name="Carol Davis" state=blocked

		MergeRequest(2):
		42 iid=101 state=merged title="Fix auth bug"
		43 iid=102 state=merged title="Add caching layer"
		```

		Ordering is `(entity_type, id)` for `traversal`, `search`, `neighbors`, and `path_finding`. For `aggregation`, server row order is preserved so the order specified by `aggregation_sort` survives the encode pass.

		For aggregation queries with node-kind group columns, the encoder lifts each unique `{type, id, properties}` cell from `@rows` into `@nodes` (deduplicated by `(entity_type, id)`). This keeps row lines one line each and avoids repeating node bodies on every bucket.

		#### Value formatting

		\| Source \| Encoded as \| Example \|
		\|---\|---\|---\|
		\| `null`, empty string \| omitted from the row \| (key does not appear) \|
		\| `null` in a `@rows` cell \| bare `null` (literal) \| `severity=null count=5` — a real bucket value, distinct from the string `"null"` which is quoted \|
		\| `true` / `false` (JSON bool) \| bare token \| `draft=true` \|
		\| `"true"` / `"false"` / `"null"` (JSON string) \| quoted \| `state="true"` — distinguishes a string from a native boolean \|
		\| Integer \| bare digits \| `iid=18`, `id=12971673076` (precision preserved up to `i64`) \|
		\| Finite float \| bare \| `avg_duration=941.131772070606` \|
		\| `NaN`, `±Inf` \| dropped \| (key does not appear) \|
		\| String matching `[A-Za-z0-9_\-:./@+]+` or an ISO datetime \| bare \| `username=stanhu`, `created_at=2026-05-08T22:55:58Z` \|
		\| ClickHouse datetime `YYYY-MM-DD HH:MM:SS[.fraction]` \| T-form (space at position 10 swapped to `T`) \| `created_at=2026-05-08T22:55:58.467450` \|
		\| Any other string \| double-quoted with `\\`, `\"`, `\n`, `\r`, `\t` escapes; other control chars dropped \| `title="line one\nline two"` \|
		\| Long text (`body`, `description`, `name`, `note`, `title`) over 200 chars \| truncated with `...` plus a sibling `<key>_len=N` breadcrumb \| `description="..." description_len=2308` \|
		\| Any other string over 1000 chars \| same truncation + breadcrumb \| \|

		Datetime validation goes through `chrono::NaiveDateTime::parse_from_str` and `DateTime::parse_from_rfc3339`. The output is built byte-for-byte from the input with at most one byte (the space at position 10) swapped to `T`; the source's fractional precision is preserved exactly rather than being round-tripped through chrono's nanosecond default.

		Property order within a node row is column-priority then alphabetical: identity (`iid`, `username`, `name`, `full_path`, `path`, `uuid`) first, then status enums (`state`, `status`, `visibility_level`), then everything else, then timestamps (`created_at`, `updated_at`, `merged_at`, `closed_at`), then long text (`title`, `description`, `body`, `note`) last. This means a truncated description never hides a shorter identity field.

		### `@edges`

		Edges grouped by relationship type. Arrow notation matches what models naturally produce when formatting graph data.

		```plaintext
		@edges
		AUTHORED(3):
		User:1 --> MergeRequest:42
		User:1 --> MergeRequest:43
		User:2 --> MergeRequest:44

		IN_PROJECT(2):
		MergeRequest:42 --> Project:100
		MergeRequest:43 --> Project:100
		```

		Ordering is total: `(path_id, step, edge_type, from, from_id, to, to_id, depth)`. Duplicates are removed using the same key. The total ordering means shuffle-invariance under property tests.

		For variable-length traversals, an edge with a depth tag carries it on the row:

		```plaintext
		@edges
		MEMBER_OF(2):
		User:1 --> Group:100 depth=1
		User:1 --> Group:200 depth=2
		```

		### `@paths` (path_finding only)

		For `path_finding`, `@edges` is replaced by `@paths`. Each path is one line: a chain of `Type:id --REL--> Type:id` segments. The agent gets the path identity without reassembling it from scattered edges.

		```plaintext
		@paths
		path=0: User:64248 --AUTHORED--> MergeRequest:482927048 --IN_PROJECT--> Project:278964
		```

		The `@nodes` section still carries full properties for each entity referenced in the chain.

		### `@rows` (aggregation only)

		Table-shaped aggregation rows. One row per line. Group columns come first, then metric columns, in the order declared by `group_by:` and `aggregations:`.

		Property grouping with a single metric:

		```plaintext
		@header
		query_type:aggregation
		goon_version:1.0.0
		nodes:0
		edges:0
		rows:5
		group_by:severity(property)
		aggregations:vulnerability_count(count:v)
		@nodes
		@edges
		@rows
		severity=medium vulnerability_count=8421
		severity=high vulnerability_count=2350
		severity=low vulnerability_count=1542
		severity=critical vulnerability_count=120
		severity=info vulnerability_count=42
		```

		Node grouping, with the lifted entity in `@nodes`:

		```plaintext
		@header
		query_type:aggregation
		goon_version:1.0.0
		nodes:3
		edges:0
		rows:3
		group_by:u(node:User)
		aggregations:merged_count(count:u)
		@nodes
		User(3):
		1243277 username=ghost1
		35702613 username=bot_a
		26832240 username=bot_b
		@edges
		@rows
		u=User:1243277 merged_count=65555
		u=User:35702613 merged_count=21277
		u=User:26832240 merged_count=20289
		```

		Ungrouped (scalar) aggregation flows through the same `@rows` path with a single row:

		```plaintext
		@header
		query_type:aggregation
		goon_version:1.0.0
		nodes:0
		edges:0
		rows:1
		aggregations:total(count:u)
		@nodes
		@edges
		@rows
		total=2347
		```

		## Implementation

		`GoonFormatter` lives in `crates/query-engine/formatters/src/goon/`. It implements the `ResultFormatter` trait the same way `GraphFormatter` does: `format(&self, output: &PipelineOutput) -> Value`. For LLM responses it composes `GraphFormatter::build_response(output)` with `goon::encode(&response, &GOON_OUTPUT_FORMAT_VERSION)` and wraps the result in `Value::String`.

		Wiring at `crates/gkg-server/src/grpc/service.rs` dispatches statically per request: `req.format == ResponseFormat::Llm` calls `GoonFormatter.format_stamped(&output)`; otherwise `GraphFormatter.format_stamped(&output)`. The result rides the gRPC `ExecuteQueryResult.formatted_text` field with format-name and format-version metadata.

		The encoder reads every field of `GraphResponse` (audited via parallel sub-agents post-implementation). Fields that travel:

		- All node and edge fields, including `GraphEdge.depth` for variable-length traversals.
		- `ColumnDescriptor.target` and `.property` (rendered in the `aggregations:` line as `function:target.property`).
		- `GroupColumnDescriptor.entity` (rendered as `name(node:Entity)`) and `.property` when the alias differs from the underlying property name (rendered as `name(property:underlying)`).
		- `Value::Null` in `@rows` cells renders as bare `null` so a "no severity assigned" bucket stays distinguishable from an absent column.

		Fields intentionally not surfaced:

		- `GraphResponse.format_version` — the upstream RAW schema version. The encoder emits `goon_version` instead; mixing both in one header creates the same field-name conflict that motivated the rename.
		- `GroupColumnDescriptor.node` — the source node alias is internal compiler state.
		- `GraphEdge.path_id` / `step` — used as sort keys and to drive `@paths` chain order; not surfaced as visible fields.

		### Determinism

		Locked by property tests with 64 cases each:

		- `shuffle_invariant`: a random payload shuffled by a seeded RNG must encode byte-identically.
		- `encoding_is_pure`: same input encodes to same bytes across calls.
		- `output_starts_with_header`: `@header\n` always first.
		- `no_unescaped_control_chars`: no raw `\r` or `\t` reaches the output.

		### Versioning

		`config/GOON_OUTPUT_FORMAT_VERSION` (semver, `1.0.0` at first release) follows the same discipline as `RAW_OUTPUT_FORMAT_VERSION` from ADR 004:

		- `scripts/check-goon-format-version.sh` mirrors `check-response-schema-version.sh`. It watches `crates/query-engine/formatters/src/goon/**.rs`, `graph.rs`, and `lib.rs`, and requires a version bump on any change.
		- Lefthook runs the check pre-commit; GitLab CI runs it on MRs in the `lint` stage.
		- Bypass for wire-neutral edits: `[skip goon-format-version-check]` in the MR description or `SKIP_GOON_FORMAT_VERSION_CHECK=1` locally.

		## Test coverage

		\| Layer \| Where \| Count \| Covers \|
		\|---\|---\|---\|---\|
		\| Unit \| `crates/query-engine/formatters/src/goon/tests.rs` \| 51 \| Header structure, sections, quoting, escape rules, datetime normalization, truncation, numerics, edges, dedup, path-finding, aggregation shapes (property + node + ungrouped), `Value::Null` row cells, depth on variable-length edges \|
		\| Property (`proptest`) \| `tests/goon_properties.rs` \| 4 × 64 \| Shuffle invariance, idempotence, header prefix, no unescaped control chars \|
		\| Snapshot (`insta`) \| `tests/goon_snapshots.rs` \| 7 \| One golden file per query shape + pagination \|
		\| Integration \| `crates/integration-tests/tests/server/goon_formatter.rs` \| 8 subtests \| Full compile → execute → redact → hydrate → format path against ClickHouse testcontainers; asserts `format_stamped` returns `(Value::String, version, FormatName::Goon)`, headers carry `goon_version`, escape behavior, aggregation shapes, raw/goon count agreement \|

		## Why not the alternatives

		Return JSON with a token-efficient serialization (MessagePack, CBOR). Binary formats save bytes, not tokens. LLMs tokenize text; a base64-wrapped MessagePack payload tokenizes about the same as the JSON it encodes.

		Return natural language summaries. Natural language is the least token-efficient encoding. "User alice authored merge request 42 titled Fix auth bug which is merged" is 15 tokens; `42 iid=101 title="Fix auth bug" state=merged` is 9. It is also unparseable for follow-up queries.

		Pipe-delimited columnar tables (TOON-style headers + values). Declaring column names once and emitting `1\|alice\|active` is ~10–15% more token-efficient than `key=value` for large result sets. Analysis of 316 AI coding sessions (83 Claude Code, 233 Codex) showed models never produce columnar output when formatting graph data. They consistently use inline `key=value`. The Pareto benchmark confirmed: `col` did not beat `kv` on cost-adjusted correctness.

		Return the same JSON as `format=raw`. Wastes 40–60% of tokens on `{`, `}`, `"key":`, and commas. The starting point that motivated this ADR.

		JSON with abbreviated keys. Shortening `"username"` to `"u"` saves tokens but forces the model to maintain a key mapping across the response. The `key=value` shape keeps full names inline at comparable token cost.

		Keep the `@hints` navigation block. The `kv` variant carried a `@hints` block listing available outgoing relationships per entity type. The Pareto benchmark showed `min` (no hints) matched `kv` on accuracy at lower token cost. The hints did not change agent behavior enough to pay for themselves, so the block was descoped.

		Hierarchical / incident-only encoding (one section per source node). The `hier` and `incident` variants scored higher on isolated graph-reasoning tasks but worse on the cost/duration axes in the benchmark. The Pareto front was won by `kv`'s flat type-grouped shape.

		## Consequences

		What improves:

		- 40–60% token reduction on the LLM path, validated across all five query shapes against production data.
		- −11% cost, −15% duration, +4.8pp correctness against raw JSON on Haiku 4.5 (p=0.043 on `tool_sequence_length`).
		- Format matches what models naturally produce and consume for graph data (validated against 316 session transcripts).
		- The shape is a pure function of `GraphResponse` — every visible field is a function of one input field. Adding a new field to the wire response is the only way to extend the format.
		- `orbit query --format=llm` outputs human-scannable text instead of dense JSON.

		What gets harder:

		- GOON is GKG-specific. Changes to `GraphResponse` need corresponding changes to the encoder (caught by the versioned CI check).
		- String quoting has more rules than pipe-delimited (escapes for `\\`, `\"`, `\n`, `\r`, `\t`; control char drops). The encoder treats this as one pass; the rules are tested.
		- Snapshot tests catch any unintended format drift, which is intentional but means any deliberate format change is a multi-line snapshot diff.

		Cross-language parity (Rust encoder vs the Python prototype in `gkg-evals-harness:vendor/skills/orbit-goon-kv/scripts/goon_encode.py`) is locked on the Rust side only. Byte-identical output on identical inputs is a deferred follow-up.

		## Out of scope

		HTTP response body shape. Workhorse owns the HTTP response. For `format=llm`, the body is currently `{result: "<goon-text>", query_type, raw_query_strings, row_count}` — the goon string lives in the `result` field of a JSON envelope. Returning the goon text as a plain `text/plain` body (so a viewer renders real newlines instead of `\n` escapes) requires a change in `workhorse/internal/orbit/sendquery.go`, not in GKG. That change is not part of this ADR.

		## References

		- Issue: [#271](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/issues/271)
		- Implementation MR: [!1289](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/1289) (merged)
		- Audit follow-up MR: [!1291](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/1291)
		- Property grouping (changed aggregation wire shape): [!1287](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/1287)
		- Benchmark harness: [`gkg-evals-harness!1`](https://gitlab.com/gitlab-org/orbit/gkg-evals-harness/-/merge_requests/1)
		- Benchmark result note: [MR note 3331102607](https://gitlab.com/gitlab-org/orbit/gkg-evals-harness/-/merge_requests/1#note_3331102607)
		- Eval archive: `gitlab-org/orbit/orbit-evals-results-archive`
		- ADR 003 (API Design): [`003_api_design.md`](003_api_design.md)
		- ADR 004 (Unified Query Response Schema): [`004_unified_response_schema.md`](004_unified_response_schema.md)
		- ADR 008 (Workhorse Query Acceleration): [`008_workhorse_query_acceleration.md`](008_workhorse_query_acceleration.md)
		- Fatemi, Halcrow, Perozzi. "Talk like a Graph: Encoding Graphs for Large Language Models." ICLR 2024.
		- [TOON specification](https://github.com/toon-format/spec/blob/main/SPEC.md)
		- [Orbit API Design (snippet)](https://gitlab.com/gitlab-org/gitlab/-/snippets/4929205)
		- Encoder source: `crates/query-engine/formatters/src/goon/`
		- Wire version constant: `config/GOON_OUTPUT_FORMAT_VERSION`
		- Response JSON Schema: `config/schemas/query_response.json`
		- Proto: `crates/gkg-server/proto/gkg.proto` (`ResponseFormat::RESPONSE_FORMAT_LLM`)