Orbit: `Definition.commit_sha` and `start_line`/`end_line` are null on all returned records despite being documented properties
## Summary `Definition.commit_sha` and `Definition.start_line` (along with `end_line`, `start_byte`, `end_byte`, `start_char`, `end_char`) are documented properties on `Definition` nodes — exposed in the schema, accepted as columns by the query DSL, and accepted as filter targets — but are **null on every returned record** when querying `gitlab-org/gitlab` (project ID 278964). This blocks any UC-11 (Code & Data Lineage) scenario that needs to identify the commit a symbol was last touched in. It would also block UC-2 (Blast Radius — historical deps), UC-9 (Incident Root Cause when a symbol-level granularity is needed), and any agent workflow that wants to link a Definition back to its source commit or file location. Adjacent to but distinct from gitlab-org/orbit/knowledge-graph#582 ("queries that should return results silently return empty"). That issue is about **multi-node aggregations returning zero rows**. This issue is about **scalar properties on returned rows being null** when they should carry source-control / source-location data. ## Reproducer ```json { "query": { "query_type": "traversal", "node": { "id": "d", "entity": "Definition", "filters": {"project_id": {"op": "eq", "value": 278964}} }, "limit": 50 } } ``` ```console $ glab orbit remote query /tmp/q.json --format raw | jq '{ total: ([.result.nodes[] | select(.type == "Definition")] | length), with_commit_sha: ([.result.nodes[] | select(.type == "Definition" and .commit_sha != null and .commit_sha != "")] | length), with_start_line: ([.result.nodes[] | select(.type == "Definition" and .start_line != null)] | length) }' { "total": 50, "with_commit_sha": 0, "with_start_line": 0 } ``` Every one of the 50 sampled Definitions has `commit_sha: null` and `start_line: null`. Same result holds when filtering to specific files (e.g. `app/models/user.rb` Definitions all return null on these properties). ## Schema documents these properties From `glab orbit remote schema Definition`: ```json { "name": "Definition", "properties": [ {"name": "id", ...}, {"name": "project_id", ...}, {"name": "branch", ...}, {"name": "commit_sha", "data_type": "String", "nullable": true, ...}, {"name": "file_path", ...}, {"name": "fqn", ...}, {"name": "name", ...}, {"name": "definition_type", ...}, {"name": "start_line", "data_type": "Int", "nullable": true, ...}, {"name": "end_line", "data_type": "Int", "nullable": true, ...}, {"name": "start_byte", "data_type": "Int", "nullable": true, ...}, {"name": "end_byte", "data_type": "Int", "nullable": true, ...}, {"name": "start_char", ...}, {"name": "end_char", ...}, {"name": "content", ...} ] } ``` All seven location/source-control properties (`commit_sha`, `start_line`, `end_line`, `start_byte`, `end_byte`, `start_char`, `end_char`) are declared with `nullable: true` — which technically permits the observed behavior — but the practical consequence is that an agent reading the schema would expect these fields to *typically* be populated (the whole point of a code graph is anchoring symbols to source) and instead gets uniform null. ## Why this matters For UC-11 specifically, **the entire "where did this code come from?" question depends on `commit_sha`**. Without it the workflow degrades to: - Find the Definition (works) - Read its commit_sha → **null** - Fall back to grep'ing the file externally, using git blame outside Orbit That's not "Orbit + REST as a two-tool dance" (which UC-9 and UC-10 demonstrated as a working pattern) — it's "Orbit can't contribute at all to this question." The graph identifies the symbol but anchors it to nothing in time or space. For UC-2 (Blast Radius) and any blame-style or evolution-style query, the same gap blocks the natural workflow. ## Adjacent but distinct findings - gitlab-org/orbit/knowledge-graph#582 — silent empty results in aggregations. Different shape (multi-node aggregation returning zero rows). Both surface a pattern of "schema looks right but data isn't there." - gitlab-org/gitlab#600162 — Ruby DSL declarations invisible. Different layer (indexer-level missing relationships). Same UAT-readiness concern. - gitlab-org/gitlab#600140 — `source_code` domain lacks `IN_PROJECT` edge. Different shape (schema-design gap). All four findings together describe a `source_code` domain that is structurally incomplete for UC-2 / UC-10 / UC-11. ## Suggested fix paths 1. **Populate `commit_sha` and `start_line`/`end_line` from the indexer's existing data.** The indexer must already know which commit a Definition was extracted from (it walks a specific tree-sha snapshot). Surfacing that data into the property fields is the cheapest fix. 2. **If the data genuinely isn't available, mark the properties as deprecated or remove them from the schema.** A documented-but-null field is worse than a missing field because the agent will repeatedly request it expecting a value. 3. **Document the limitation explicitly.** If the data won't be populated soon, the schema description should call out that these properties are reserved for future use. ## Environment - `glab` version: `1.94.0 (aa456f48)` - Endpoint: production Orbit (`POST /api/v4/orbit/query` on gitlab.com) - Tested 2026-05-14 against `gitlab-org/gitlab` (project ID 278964) - Sample sizes: 50 Definitions cross-project, 8+ Method Definitions in `app/models/user.rb`, all return null on the listed properties ## Suggested severity `severity::3` — does not block use entirely, but materially blocks UC-11's symbol-level provenance workflow and any blame-style / evolution-style query on the source_code domain. ## References - Parent customer-zero issue: gitlab-org/orbit/knowledge-graph#602 - Surfaced during UC-11 S2 testing under gitlab-org/orbit/knowledge-graph#607 - Customer Zero bug-reporting epic: gitlab-org&21852 - Related: gitlab-org/orbit/knowledge-graph#582, gitlab-org/gitlab#600162, gitlab-org/gitlab#600140
issue