GitLab Knowledge Graph Rails Integration
**Parent Epic**: [GitLab Knowledge Graph Core Development gitlab-org/rust&38](https://gitlab.com/groups/gitlab-org/rust/-/epics/38) **Top-Level Epic**: [GitLab Knowledge Graph Second Iteration - GKG as a Service &19744](https://gitlab.com/groups/gitlab-org/-/epics/19744) --- ## Summary This epic tracks the integration layer that connects the GitLab Knowledge Graph service to GitLab Rails for authorization, authentication, and data enrichment. Rails serves as the primary authentication and authorization gateway. ## Architecture Overview All access to the Knowledge Graph is proxied through GitLab Rails. This diagram illustrates the request flow: ```mermaid sequenceDiagram actor Client participant Rails participant WebServer as GKG Web Server participant AuthEngine as Query Pipeline Client->>Rails: Send Request Rails->>WebServer: Query Knowledge Graph (gRPC bidi stream) activate WebServer WebServer->>WebServer: Compile Graph Query & Execute on ClickHouse WebServer->>AuthEngine: Pass result set activate AuthEngine AuthEngine->>AuthEngine: Identify redactable columns from ontology AuthEngine->>AuthEngine: Group rows by entity type + permission + resource IDs Note over AuthEngine, Rails: gRPC Bidirectional Streaming RedactionExchange AuthEngine->>Rails: RedactionExchange.required (ResourceCheck[]) Rails->>Rails: Ability.allowed? per resource Rails-->>AuthEngine: RedactionExchange.response (ResourceAuthorization[]) AuthEngine->>AuthEngine: apply_authorizations() - mark unauthorized rows AuthEngine->>WebServer: Redacted result set deactivate AuthEngine deactivate WebServer WebServer-->>Rails: ToolResult/QueryResult with redacted payload Rails-->>Client: Final redacted data ``` --- ## Work Streams ### 1. Namespace Access Control Enable Knowledge Graph access on a per-namespace basis. The indexing service needs to recognize which namespaces are enabled before processing their data. **Scope**: - PostgreSQL table tracking which root namespaces have Knowledge Graph access enabled - Admin API endpoint for enabling/disabling namespaces programmatically - Chatops command for operators to manage namespace enrollment - Feature flag for controlled rollout - Replicate the table into ClickHouse through siphon --- ### 2. Traversal ID API (Layer 2 Authorization) Provide the Knowledge Graph service with the user's accessible traversal_ids for query-time filtering. This is the second layer of the three-layer security model. **Purpose**: Rails computes which groups/projects the user can access and passes this information to the Knowledge Graph service. The query engine then filters the results to include only data within the user's accessible namespace hierarchy. **Scope**: - Rails method to retrieve all `traversal_id` prefixes where user has Reporter+ access - Trie optimization to avoid redundancy (e.g., if user has access to `[100]`, don't also include `[100, 200]` since it's already covered) - JWT token generation containing user's traversal_ids **Technical Details**: - Rails queries user's group memberships: `user.groups.where('members.access_level >= ?', Gitlab::Access::REPORTER)` - Traversal IDs passed in JWT payload: `{user_id, username, organization_id, traversal_ids, iat, exp}` - Token expiry: 5 minutes (short-lived to limit misuse window) - Query engine generates ClickHouse SQL: `WHERE arrayExists(prefix -> startsWith(traversal_ids, prefix), allowed_prefixes)` --- ### 3. Layer 3 Redaction Service (via gRPC Bidirectional Streaming) The final authorization pass — filters query results based on resource-specific permissions using Rails' `DeclarativePolicy` system. **Purpose**: Traversal IDs handle coarse-grained group/project filtering but cannot account for resource-specific permissions. Layer 3 runs `Ability.allowed?` checks against each returned resource before sending results to the client. **Why Layer 3 is necessary**: - Confidential issues (only visible to project members and issue participants) - Runtime checks (SAML group links, IP restrictions) - Custom roles or fine-grained permissions - Future permission model changes apply automatically without GKG service changes **Scope**: - gRPC RedactionExchange messages within ExecuteTool/ExecuteQuery bidi streams - Integration with GitLab's declarative policy system (`Ability.allowed?`) - Ontology-defined RedactionConfig for compile-time redaction spec derivation - Batched resource checks grouped by (entity_type, ability, resource_ids) The redaction exchange occurs within the same gRPC bidirectional stream as the query execution. See the [ADR: gRPC Communication Protocol](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/blob/main/docs/design-documents/adr-grpc-communication.md) for protocol details. **Implementation reference**: - GitLab's [SearchService](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/services/search_service.rb) implements similar `redact_unauthorized_results` logic - Uses `Ability.allowed?(current_user, :"read_#{object.to_ability_name}", object)` --- ### 4. Service-to-Service Authentication Implement secure communication between Rails and the Knowledge Graph service using a defense-in-depth approach. **Purpose**: Ensure that only authenticated requests from GitLab Rails can access the Knowledge Graph service, and that user context is securely transmitted for authorization decisions. **Scope**: - JWT signing with shared secret (HS256 algorithm) - JWT payload structure containing user context and permissions - mTLS configuration for certificate validation and encrypted transport - Shared secret management for secure storage and rotation **Security Model**: - **JWT**: Authenticates requests and carries user context (user_id, organization_id, traversal_ids) - **mTLS**: Verifies service identity at network level and encrypts all traffic between services - Pattern similar to existing Zoekt/Exact Code Search integration --- ### 5. Data Enrichment Investigate request-time data injection as an alternative to storing large text fields in the graph. **Purpose**: Large text fields (issue descriptions, MR descriptions) may be expensive to store and index in graph tables. This work stream explores whether these fields can be fetched at request time instead. **Scope**: - Investigate injecting description, title at request time rather than storing in graph - Research hybrid fetch strategy from ClickHouse data lake or Elasticsearch - Evaluate trade-offs between storage cost, query latency, and implementation complexity --- ## Monitoring and Observability The following metrics should be implemented for Rails integration: | Metric | Description | |--------|-------------| | `gkg.rails.traversal_ids_computed` | Histogram of traversal IDs computed per user | | `gkg.redaction.checks_performed` | Counter of authorization checks performed | | `gkg.redaction.resources_denied` | Counter of resources filtered by Layer 3 | | `gkg.redaction.batch_size` | Histogram of authorization batch sizes | | `gkg.redaction.latency` | Histogram of Rails authorization check latency | | `gkg.auth.jwt_verification_failed` | Counter of failed JWT verifications | | `gkg.auth.jwt_expired` | Counter of expired tokens received | **Alerts**: - Warning if `gkg.redaction.resources_denied` rate exceeds 20% (may indicate traversal_id filtering is ineffective) - Warning if JWT verification failure rate exceeds 1% - Warning if user has more than 100 distinct traversal ID prefixes (permission explosion)
epic