Knowledge Graph JWT Passing

Problem to Solve

The GitLab Knowledge Graph (GKG) service needs to receive user authorization context from GitLab Rails to perform query-time filtering. Without this, the GKG service cannot determine which namespaces (groups/projects) a user has Reporter+ access to, making it impossible to enforce GitLab's permission model on graph queries.

The current architecture requires Rails to pass authorization data to the GKG webserver via JWT tokens. This data must be:

  1. Compact enough to fit within JWT payload size constraints
  2. Structured for efficient prefix-based filtering on traversal_ids
  3. Fast to decode and apply on every request

Proposed Solution

Implement a JWT-based authorization passing mechanism that encodes a user's Reporter+ namespace access using GitLab's existing traversal_ids infrastructure and trie optimization patterns.

Technical design

1. Data structure: traversal IDs

GitLab uses traversal_ids - an ordered array of namespace IDs from root to current namespace stored in PostgreSQL as bigint[]:

GitLab (id: 1) > Engineering (id: 2) > Manage (id: 3) > Access (id: 4)
traversal_ids = [1, 2, 3, 4]

This enables efficient hierarchy queries:

  • Ancestors: [1], [1, 2], [1, 2, 3]
  • Descendants: [1, 2, 3, 4, *]
  • Root: traversal_ids[1]

2. Reporter+ access level

From lib/gitlab/access.rb:

REPORTER       = 20   # Reporter threshold
DEVELOPER      = 30
MAINTAINER     = 40
OWNER          = 50

Reporter+ means: access_level >= 20

3. Trie optimization

GitLab's Namespaces::Traversal::TrieNode (from lib/namespaces/traversal/trie_node.rb) provides efficient authorization:

# Build trie from user's authorized namespace traversal_ids
trie = Namespaces::Traversal::TrieNode.build(authorized_groups.map(&:traversal_ids))

# Check if a document's namespace is covered by user's permissions
trie.covered?([1, 2, 3, 4])  # Returns true if user has access to [1,2] or [1,2,3] etc.

Key operations:

  • covered? - Checks if any ancestor in the trie covers the given traversal_ids
  • prefix_search - Returns all descendant paths matching a prefix
  • On insert, child nodes are cleared when a parent is added (broader permissions subsume narrower ones)

4. Traversal ID compaction

For users with many authorized namespaces, Gitlab::Utils::TraversalIdCompactor reduces payload size:

# Input: 8 traversal_ids
[
  [1, 21],
  [1, 2, 3],
  [1, 2, 4],
  [1, 2, 5],
  [1, 2, 12, 13],
  [1, 6, 7],
  [1, 6, 8],
  [9, 10, 11]
]

# Compacted to 4 entries:
Gitlab::Utils::TraversalIdCompactor.compact(traversal_ids, 4)
# => [[1, 2], [1, 6], [9, 10, 11], [1, 21]]

5. JWT payload structure

{
  "sub": "user:12345",
  "iat": 1706200000,
  "exp": 1706200300,
  "iss": "gitlab",
  "aud": "gkg-webserver",
  "admin": false,
  "organization_id": 1,
  "min_access_level": 20,
  "group_traversal_ids": ["1-2-", "5-6-7-"],
  "project_ids": [101, 102, 103]
}

Fields:

  • admin - If true, skip authorization checks entirely
  • organization_id - Tenant isolation (Layer 1)
  • min_access_level - The access level used to compute this token (REPORTER=20)
  • group_traversal_ids - Compacted trie of namespaces where user has reporter+ (via group membership), formatted as prefix strings
  • project_ids - Direct project memberships with reporter+ that aren't covered by group traversal_ids

6. Format for pre-filtering

Following Elasticsearch/Zoekt patterns from ee/lib/search/elastic/concerns/authorization_utils.rb:

def format_traversal_ids(traversal_ids)
  traversal_ids.map { |id_array| "#{id_array.join('-')}-" }
end
# [1, 2, 3] becomes "1-2-3-"

The trailing - enables prefix matching without false positives.

7. GKG webserver pre-filtering

func isAuthorized(docTraversalIDs string, jwtTraversalIDs []string) bool {
    for _, authPrefix := range jwtTraversalIDs {
        if strings.HasPrefix(docTraversalIDs, authPrefix) {
            return true
        }
    }
    return false
}

Implementation components

Rails side (this issue)

  1. KnowledgeGraph::JwtAuth module (ee/lib/knowledge_graph/jwt_auth.rb)
    • JWT token generation with HS256 signing
    • 5-minute token expiry (matching Zoekt pattern)
    • Shared secret management
  2. KnowledgeGraph::AuthorizationContext (ee/lib/knowledge_graph/authorization_context.rb)
    • Compute Reporter+ groups for user using Search::GroupsFinder
    • Build trie from authorized groups' traversal_ids
    • Compact traversal_ids to fit payload constraints
    • Format as prefix strings
  3. Internal API endpoint (ee/lib/api/internal/knowledge_graph.rb)
    • POST /api/internal/knowledge_graph/authorize - Generate JWT for user
    • Returns signed JWT with authorization context

GKG webserver side

  1. JWT verification middleware
  2. Traversal ID prefix filter injection into ClickHouse queries
  3. Admin bypass logic

Key files to reference

File Purpose
lib/namespaces/traversal/trie_node.rb Core trie implementation
lib/gitlab/utils/traversal_id_compactor.rb Payload size reduction
ee/lib/search/elastic/concerns/authorization_utils.rb Elasticsearch auth patterns
ee/lib/search/zoekt/jwt_auth.rb Zoekt JWT pattern to follow
ee/lib/search/zoekt/access_branch_builder.rb Authorization branch construction
ee/app/finders/search/groups_finder.rb Reporter+ group filtering
lib/gitlab/access.rb Access level constants
Edited by Michael Angelo Rivera