Skip to content

Draft: Add GitLab Knowledge Graph (GKG) StatefulSet support

Summary

This MR implements GitLab Knowledge Graph (GKG) support in the gitlab-zoekt Helm chart. GKG provides AI-powered code understanding and semantic search capabilities, deployed as a multi-container StatefulSet alongside the existing Zoekt deployment.

Implementation Details

Architecture

The GKG StatefulSet consists of 4 containers working together:

  1. gkg-proxy (renamed from zoekt-orchestrator)

    • Manages indexing operations
    • Communicates with GitLab Rails for indexing tasks
    • Uses the standard gitlab-zoekt image
    • Environment: GITLAB_ZOEKT_GKG_INDEX_URL, GITLAB_ZOEKT_GKG_QUERY_URL
  2. gkg-internal-gateway

    • Nginx-based gateway for internal routing
    • Supports TLS and authentication
    • Inherits global gateway configuration
  3. gkg-indexer

    • Handles GKG indexing operations
    • Uses gitlab-gkg CNG image
    • Listens on port 3333
    • Environment: GITLAB_GKG_MODE=indexer
  4. gkg-webserver

    • Serves GKG query requests
    • Uses gitlab-gkg CNG image
    • Listens on port 3334
    • Supports direct JWT authentication from Rails
    • Environment: GITLAB_GKG_MODE=webserver

Key Features

  • Persistent Storage: Uses volumeClaimTemplates for GKG data (default 1Gi)
  • Service Discovery: Headless service enables StatefulSet pod communication
  • Security: JWT authentication, TLS support, configurable security contexts
  • Configuration: Extensive customization options through values.yaml
  • Health Checks: Liveness/readiness probes for all containers
  • Resource Management: Configurable resource limits/requests per container
  • Inheritance: Falls back to global chart settings for common configurations
  • Certificate Support: Integrates with GitLab certificate management

Files Added/Modified

New Templates:

  • templates/statefulset-gkg.yaml - Multi-container StatefulSet definition
  • templates/svc-gkg.yaml - Headless service for StatefulSet

Modified Templates:

  • templates/_helpers.tpl - Added GKG helper templates (fullname, labels, selectors)

Modified Values:

  • values.yaml - Added comprehensive gkg.* configuration section

Configuration

Basic Usage

Enable GKG in your values.yaml:

gkg:
  enabled: true
  replicas: 1
  storage: 1Gi

Advanced Configuration

gkg:
  enabled: true
  replicas: 2
  storage: 10Gi
  storageClassName: "fast-ssd"
  
  # Proxy configuration (manages indexing)
  proxy:
    resources:
      requests:
        cpu: 500m
        memory: 1Gi
    environment:
      GITLAB_ZOEKT_GKG_INDEX_URL: "http://localhost:3333"
      GITLAB_ZOEKT_GKG_QUERY_URL: "http://localhost:3334"
  
  # Indexer configuration
  indexer:
    image:
      repository: registry.gitlab.com/gitlab-org/build/cng/gitlab-gkg
      tag: add-gitlab-gkg-image
    resources:
      requests:
        cpu: 1
        memory: 2Gi
  
  # Webserver configuration
  webserver:
    image:
      repository: registry.gitlab.com/gitlab-org/build/cng/gitlab-gkg
      tag: add-gitlab-gkg-image
    resources:
      requests:
        cpu: 500m
        memory: 1Gi
  
  # Gateway configuration (inherits from global)
  gateway:
    tls:
      certificate:
        enabled: true
        secretName: gkg-tls-cert

Testing

Tested successfully on Minikube with the following verification:

# Install with GKG enabled
helm install gitlab-zoekt . \
  --set gkg.enabled=true \
  --set indexer.internalApi.secretName=gitlab-zoekt-internal-api \
  --set indexer.internalApi.secretKey=.gitlab_shell_secret \
  --set indexer.internalApi.gitlabUrl=http://gitlab.example.com

# Verify deployment
kubectl get statefulset gitlab-zoekt-gkg
kubectl get pods -l app.kubernetes.io/component=gkg

# Check all containers are running
kubectl get pod gitlab-zoekt-gkg-0
# Output: 4/4 Running

# Test health endpoints
kubectl exec gitlab-zoekt-gkg-0 -c gkg-proxy -- curl -s http://localhost:3333/health
# Output: {"status":"OK"}

kubectl exec gitlab-zoekt-gkg-0 -c gkg-proxy -- curl -s http://localhost:3334/health
# Output: {"status":"OK"}

Verification Results

  • All 4 containers start successfully
  • Health checks pass for indexer and webserver
  • Persistent volume claims created correctly
  • Environment variables properly configured
  • JWT secret mounted correctly
  • Processes running with correct arguments:
    /bin/gkg-server-deployed -m indexer --bind 0.0.0.0:3333 --secret-path /.gitlab_shell_secret --data-dir /data/gkg
    /bin/gkg-server-deployed -m webserver --bind 0.0.0.0:3334 --secret-path /.gitlab_shell_secret --data-dir /data/gkg

Environment Variables

Validated from GDK Implementation

The environment variables used in this implementation have been validated against the GDK implementation (gitlab-org/gitlab-development-kit!5334 (merged)):

gkg-proxy:

  • GITLAB_ZOEKT_GKG_INDEX_URL - Points to gkg-indexer (http://localhost:3333)
  • GITLAB_ZOEKT_GKG_QUERY_URL - Points to gkg-webserver (http://localhost:3334)
  • GITLAB_ZOEKT_MODE=indexer - Runs in indexer mode with GKG support

gkg-indexer & gkg-webserver:

  • GITLAB_GKG_MODE - Operating mode (indexer/webserver)
  • GITLAB_GKG_SECRET_PATH - JWT secret file path (/.gitlab_shell_secret)
  • GITLAB_GKG_DATA_DIR - Data directory (/data/gkg)

Related Work

Architecture Decisions

Direct Rails → gkg-webserver Communication

The current implementation supports direct communication from GitLab Rails to the gkg-webserver with JWT authentication, following the updated architecture that leverages the built-in JWT support in the CNG image.

Container Naming

Renamed zoekt-orchestrator to gkg-proxy to better reflect its role in the GKG architecture as a proxy/coordinator for GKG operations.

Image Strategy

  • gkg-proxy: Reuses the standard gitlab-zoekt image (indexer mode)
  • gkg-indexer/webserver: Uses dedicated gitlab-gkg CNG image with dual-mode support

Migration Path

Since GKG is a new feature, there is no migration required. The feature is disabled by default (gkg.enabled: false) and can be enabled when ready.

Checklist

  • StatefulSet template supports multi-container GKG deployment
  • Uses actual gitlab-gkg CNG image
  • Renamed zoekt-orchestrator to gkg-proxy
  • Configured validated environment variables
  • Health checks use /health endpoints on correct ports
  • JWT authentication configured for webserver
  • Persistent storage configured for GKG data
  • Headless service enables pod-to-pod communication
  • Configuration options allow customization of all components
  • Security features (TLS, JWT) are configurable
  • Tested with actual GKG container images
  • All containers running and healthy in test deployment

Merge request reports

Loading