data correctness issues found via MCP endpoint testing
Summary
Systematic comparison of MCP endpoint responses (POST /api/v4/orbit/mcp) against the GitLab REST API on staging revealed 5 data correctness issues. Testing used tools/call with query_graph and compared results field-by-field against equivalent REST API calls.
Environment: staging.gitlab.com, tested 2026-03-25
Finished Checklist:
- Bug 1:
full_pathcontains slug instead of namespaced path (Project + Group) (@michaelusa) - Bug 2: MEMBER_OF edges severely incomplete (~93% missing) - (@michaelangeloio)
- Bug 3: Job
failure_reasondefaults to"unknown"instead of null (@michaelusa) - Bug 4: Milestone
due_datealways null (@michaelangeloio) - Bug 5: Ungrouped aggregation results invisible in LLM format (@michaelusa, !704 (merged))
Bug 1: full_path contains slug instead of namespaced path (Project + Group)
Severity: High — full_path is a default column and the primary way users identify projects/groups.
Root cause: config/ontology/nodes/core/project.yaml:28 and group.yaml:28 both have source: path, which maps to the project/group slug. The ontology field is named full_path but it pulls the wrong source column.
Reproduction:
// MCP query
{"query_type":"search","node":{"id":"p","entity":"Project","columns":["name","full_path"],"node_ids":[278964]},"limit":1}| Source | Value |
|---|---|
MCP full_path |
"gitlab" |
REST path_with_namespace |
"gitlab-org/gitlab" |
REST path (slug) |
"gitlab" |
Same for subgroups:
| Group ID | MCP full_path |
REST full_path |
|---|---|---|
| 1540914 | "gitter" |
"gitlab-org/gitter" |
| 1602322 | "database-specialists" |
"gitlab-org/database-specialists" |
| 1755573 | "frontend" |
"gitlab-org/frontend" |
Fix: Change source: path to the correct Siphon source column that contains the namespaced path (e.g., full_path or path_with_namespace depending on what Siphon exposes) in both project.yaml and group.yaml.
Bug 2: MEMBER_OF edges severely incomplete (~93% missing)
Severity: High — membership queries return a small fraction of actual members.
Reproduction:
// MCP: count members of gitlab-org/gitlab
{"query_type":"traversal","nodes":[{"id":"p","entity":"Project","node_ids":[278964]},{"id":"u","entity":"User","columns":["username"]}],"relationships":[{"type":"MEMBER_OF","from":"u","to":"p"}],"limit":1000}| Source | Member count |
|---|---|
| MCP (MEMBER_OF edges) | 41 |
REST (/projects/278964/members/all) |
609 |
The two sets have zero overlap in their first pages. MCP returns users like 58926 (Haydn Mackay, Reporter access) which ARE valid members per REST, but the core maintainers/owners (user 1, 444, etc.) are absent from MCP.
Likely cause: Inherited group memberships may not be flowing through Siphon CDC, or the indexer only captures direct project memberships.
Bug 3: Job failure_reason defaults to "unknown" instead of null
Severity: Low — cosmetic, but could confuse LLM consumers into thinking jobs failed.
Reproduction:
// MCP: get jobs in gitlab-org/gitlab
{"query_type":"traversal","nodes":[{"id":"p","entity":"Project","node_ids":[278964]},{"id":"j","entity":"Job","columns":"*"}],"relationships":[{"type":"IN_PROJECT","from":"j","to":"p"}],"limit":3}Job 40962453 ("jest 4/5", status=canceled):
| Source | failure_reason |
|---|---|
| MCP | "unknown" |
REST (/projects/278964/jobs/40962453) |
null |
Fix: The indexer or ClickHouse default should use null/empty instead of "unknown" when no failure reason exists.
Bug 4: Milestone due_date always null
Severity: Medium — milestone dates are important for planning queries.
Reproduction:
// MCP: search milestones
{"query_type":"search","node":{"id":"m","entity":"Milestone","columns":"*"},"limit":5}Milestone 349690 ("10.1"):
| Source | due_date |
|---|---|
| MCP | null |
REST (/groups/9970/milestones/349690) |
"2017-10-22" |
All 20 milestones tested had due_date: null in MCP. The field exists in the ontology schema but values aren't being indexed.
Bug 5: Ungrouped aggregation results invisible in LLM format
Severity: Medium — scalar counts (the most common aggregation) return no usable data to MCP/LLM consumers.
Reproduction:
// MCP: count all projects
{"query_type":"aggregation","nodes":[{"id":"p","entity":"Project"}],"aggregations":[{"function":"count","target":"p","alias":"total"}]}Response: row_count: 1 but the result body is {"query_type":"aggregation","nodes":[],"edges":[]}. The actual count value is absent.
The SQL is correct (SELECT COUNT(p.id) AS total) and returns one row. But the GOON formatter maps results into nodes and edges — a scalar aggregation has neither, so the value is silently dropped.
Grouped aggregations work fine. For example, counting jobs per project returns job_count as a property on each Project node. The bug only affects ungrouped (scalar) aggregations.
What's working correctly
- SQL injection is safe (parameterized queries)
- Schema validation rejects invalid entities, relationships, limits (0, 1001), empty arrays
- Unknown tool names return proper JSON-RPC
-32602errors - Star counts, MR fields (title, state, source_branch, iid), job names/statuses all match REST
- Path finding, neighbors, contains/gt/in filters all work
- Security redaction is enforced (results scoped to authorized traversal paths)
- Grouped aggregations return correct counts (e.g., 10.9M jobs for gitlab-org/gitlab)