Orbit: Ruby DSL declarations and `prepend_mod_with` are invisible to the source_code indexer (CanCanCan policies, ActiveRecord scopes, CE/EE composition)
## Summary The Orbit `source_code` domain indexer captures conventional Ruby/JS code structure well (`def` methods, `class A < B`, `include Module`) but does not walk Ruby DSL-shaped declarations. For a Rails codebase the size and shape of `gitlab-org/gitlab`, the dominant dependency-shaped relationships are expressed through DSL patterns, none of which the indexer captures today. Three concrete coverage gaps surfaced during UC-10 customer-zero testing (gitlab-org/orbit/knowledge-graph#606), all sharing the same root cause: 1. **Call edges through DSL block bodies are not extracted.** `condition(:guest) { team_member? }` makes `team_member?` a logical dependency, but the indexer does not register a CALLS edge from the enclosing class to it. CanCanCan policy DSL (`condition`, `rule`, `policy`, `enable`, `prevent`), ActiveRecord scopes (`scope :X, -> { ... }`), validations (`validates`), callbacks (`before_save`), and route helpers all fall in this bucket. 2. **Ruby `include Module` is not captured as `ImportedSymbol`.** ImportedSymbol coverage is Go (`workhorse/`) and JavaScript (`spec/frontend/`) only. Querying for `ImportedSymbol.identifier_name = "Gitlab::InternalEventsTracking"` returns zero results despite real `include Gitlab::InternalEventsTracking` statements in at least 8 source files. 3. **`prepend_mod_with(...)` macro is invisible to the `EXTENDS` edge.** This is the GitLab-specific dynamic-prepend mechanism used across the entire `gitlab-org/gitlab` codebase to mix EE modules into CE classes. CE `Project` has 0 incoming EXTENDS edges from `EE::Project` and `EE::Project` has 0 outgoing EXTENDS edges to CE Project — even though they ARE the EE override relationship at runtime. These are filed as one consolidated issue because they share a root cause (Ruby DSL-shaped declarations and GitLab-specific macros not walked by the indexer), and because addressing them together is likely a single workstream on the indexer side. ## Positive partial finding (worth keeping) The `EXTENDS` edge captures more than the schema describes. Schema says *"inheritance, interface implementation, struct embedding"* — but the edge also captures Ruby `include Module` mixin relationships. CE `Project` has 48 outgoing EXTENDS edges including dozens of `include`-d concerns (`Routable`, `Sortable`, `EachBatch`, etc.). This is meaningfully more coverage than the schema description implies, and is the partial mitigation for gap #2 above. The schema docs should be updated to reflect this — agents reading the schema literally would not expect `include` relationships in `EXTENDS`. ## Reproducers ### Gap 1: DSL block body call extraction ```bash # CE app/policies/project_policy.rb is 1049 lines with extensive DSL: # 66 condition declarations # 98 rule blocks # 56 enable :X grants # 251 prevent :X denials # 11 def methods # Orbit captures only the 5 `def`-declared helpers + 1 class: glab orbit remote query (File DEFINES Definition for app/policies/project_policy.rb) → 6 Definitions total. 471 DSL-shaped declarations invisible. # And calls through DSL bodies don't register either: glab orbit remote query (neighbors, incoming CALLS edges to team_member?) → 1 trivial self-edge (ProjectPolicy → team_member?) # REST ground truth: team_member? is referenced from at least 6 condition blocks # across CE + EE policies. None captured. ``` ### Gap 2: include Module as ImportedSymbol ```bash glab orbit remote query (ImportedSymbol lookup, identifier_name eq "InternalEventsTracking") → 0 results glab orbit remote query (ImportedSymbol lookup, any, project_id = 278964) → returns only Go imports (workhorse/) and JS imports (spec/frontend/) # REST ground truth: ≥ 8 files include Gitlab::InternalEventsTracking # (app/models/ci/pipeline.rb, ee/app/graphql/resolvers/..., lib/gitlab/auth.rb, etc.) # Substitute path that works: CALLS edges to the module's primary method glab orbit remote query (neighbors, incoming CALLS to Gitlab::InternalEventsTracking::track_internal_event) → 100 CALLS edges, 101 caller Definitions across real service/controller/worker files ``` ### Gap 3: prepend_mod_with invisible to EXTENDS ```bash # CE Project gets EE Project prepended via: # app/models/project.rb:4151 → Project.prepend_mod_with('Project') # Not a literal `prepend EE::Project`; the macro resolves dynamically in EE builds. glab orbit remote query (neighbors, incoming EXTENDS to CE Project) → 2 edges only (QA::Resource::Fork, API::Entities::ProjectWithAccess) Not present: EE::Project glab orbit remote query (neighbors, outgoing EXTENDS from EE::Project) → 0 edges # Both Definitions exist in the graph; their relationship does not. ``` ## Impact - **UC-10 (Dependency Analysis / Full Stack)** is the most directly affected. Three of its four scenarios surface these gaps. The "1-2 min full-stack dependency map" impact claim is not achievable on Rails codebases without addressing these. - **UC-4 (Faster Code Review via Dependency Mapping)** is likely affected — review-time dependency analysis on Rails code hits the same gaps. - **UC-2 (Blast Radius Analysis)** for code-level dependencies hits gaps #1 and #3. - **UC-7 (Team Expertise / Bus Factor)** when expertise is concentrated in policy/routing/scope authorship hits gap #1. For `gitlab-org/gitlab` specifically — the project the public-beta UAT is explicitly testing — these gaps are the bulk of the codebase's dependency structure. Most Ruby files in CE + EE are heavily DSL-driven. ## Suggested fixes (in order of impact) 1. **Extend the indexer to walk Ruby DSL block bodies for CALLS extraction.** Specifically: `condition`, `rule`, `policy`, `enable`, `prevent` for CanCanCan; `scope` for ActiveRecord; `validates`, `before_*`, `after_*` for AR callbacks; route DSL helpers. Each of these takes a block whose body is real Ruby — the indexer just needs to walk them. 2. **Capture Ruby `include`, `extend`, `prepend` as ImportedSymbols (or document the EXTENDS-as-include mitigation).** Either approach works; the agent needs SOME path to "which files compose this module." The EXTENDS-edge mitigation already exists for `include`; documenting it would be a quick win. 3. **Recognize GitLab's `prepend_mod_with` macro.** This is project-specific code but it's the dominant CE/EE composition mechanism in `gitlab-org/gitlab`. Either the indexer special-cases `prepend_mod_with('X')` → resolves to `prepend EE::X`, or the GitLab codebase exposes the relationship in a more graph-friendly form. Both are workable; without one of them, CE/EE dependency analysis is structurally blocked. 4. **Update the EXTENDS edge schema description** to call out that Ruby `include Module` is captured. Schema currently reads "inheritance, interface implementation, struct embedding" — agents reading literally would not query EXTENDS for mixin relationships. ## Environment - `glab` version: `1.94.0 (aa456f48)` - Endpoint: production Orbit (`POST /api/v4/orbit/query` on gitlab.com) - Tested 2026-05-14 against `gitlab-org/gitlab` (project ID 278964) ## Suggested severity `severity::2` — these gaps materially block UC-10 testing for the public-beta UAT scope. The blast radius of "you can't ask Orbit what depends on a Ruby ability or what overrides a CE class" on `gitlab-org/gitlab` is large enough that fixing this is a beta-readiness concern, not a polish concern. ## References - Parent customer-zero issue: gitlab-org/orbit/knowledge-graph#602 - Surfaced during UC-10 testing under gitlab-org/orbit/knowledge-graph#606 (S1, S2, S3) - Customer Zero bug-reporting epic: gitlab-org&21852 - Related (different root cause, same issue family): gitlab-org/orbit/knowledge-graph#577 (Definition `definition_type` case-sensitivity), gitlab-org/orbit/knowledge-graph#582 (queries silently returning empty), gitlab-org/gitlab#600140 (`source_code` nodes lack `IN_PROJECT` edge)
issue