Add support for GKG in the Helm Chart
Summary
This issue tracks the implementation of a StatefulSet for GitLab Knowledge Graph (GKG) in the gitlab-zoekt helm chart, based on the POC work done in gitlab-org/cloud-native/charts/gitlab-zoekt!126 and the new CNG image being developed in gitlab-org/build/CNG!2652 (closed).
Background
The GitLab Knowledge Graph requires a multi-container deployment with persistent storage for indexing and serving knowledge graph data. A StatefulSet is needed to provide stable network identities and persistent storage for the GKG components.
Implementation Details from POC
The POC merge request demonstrates a complete StatefulSet implementation with the following components:
StatefulSet Configuration (templates/statefulset-gkg.yaml)
-
Multi-container pod with 4 containers:
- gkg-proxy (renamed from zoekt-orchestrator): Manages indexing operations and communicates with GitLab Rails
- gkg-internal-gateway: Nginx gateway for internal routing and authentication
- gkg-indexer: Handles knowledge graph indexing (now available as real CNG image)
- gkg-webserver: Serves knowledge graph queries (now available as real CNG image)
Key Features Implemented
-
Persistent Storage: Uses
volumeClaimTemplatesfor GKG data persistence - Service Discovery: Headless service for StatefulSet pod communication
- Security: Supports TLS certificates and JWT authentication
-
Configuration: Extensive configuration options through
values.yaml - Health Checks: Liveness and readiness probes for all containers
- Resource Management: Configurable resource limits and requests
Helm Templates Added
-
templates/statefulset-gkg.yaml- Main StatefulSet definition (262 lines) -
templates/svc-gkg.yaml- Headless service for StatefulSet -
templates/_helpers.tpl- Helper templates for GKG labels and naming
New CNG Image Details
The CNG MR !2652 introduces the actual gitlab-gkg container image that replaces the placeholder images from the POC:
Image Architecture
-
Base: Multi-stage build using
gitlab-rust→gitlab-base - Language: Rust 1.89.0 (supports 2024 edition required by knowledge-graph)
-
Binary:
/bin/gkg-server-deployed(built from knowledge-graph repository) - Dual-mode operation: Single image supports both indexer and webserver modes
Container Configuration
# Replace placeholder images with actual CNG image
gkg:
indexer:
image:
repository: registry.gitlab.com/gitlab-org/build/cng/gitlab-gkg
tag: v0.17.0 # Current version from CNG MR
pullPolicy: IfNotPresent
listen:
port: 3333 # Default indexer port
webserver:
image:
repository: registry.gitlab.com/gitlab-org/build/cng/gitlab-gkg
tag: v0.17.0 # Current version from CNG MR
pullPolicy: IfNotPresent
listen:
port: 3334 # Default webserver port
Environment Variables
| Variable | Description | Default | Required |
|---|---|---|---|
GITLAB_GKG_MODE |
Operating mode (indexer | webserver) |
- | |
GITLAB_GKG_SECRET_PATH |
JWT secret file path | /.gitlab_shell_secret |
|
GITLAB_GKG_DATA_DIR |
Data directory | /data/gkg |
|
PORT |
Override default port | 3333/3334 by mode |
Health Checks
-
Indexer:
GET http://127.0.0.1:3333/health -
Webserver:
GET http://127.0.0.1:3334/health -
Built-in: Container includes
/scripts/healthcheckscript
Updated StatefulSet Container Definitions
Based on the CNG image, the StatefulSet containers should be updated:
# gkg-indexer container (updated from POC)
- name: gkg-indexer
image: registry.gitlab.com/gitlab-org/build/cng/gitlab-gkg:v0.17.0
env:
- name: GITLAB_GKG_MODE
value: "indexer"
- name: GITLAB_GKG_SECRET_PATH
value: "/.gitlab_shell_secret"
- name: GITLAB_GKG_DATA_DIR
value: "/data/gkg"
ports:
- containerPort: 3333
name: gkg-indexer
livenessProbe:
httpGet:
path: /health
port: 3333
readinessProbe:
httpGet:
path: /health
port: 3333
# gkg-webserver container (updated from POC)
- name: gkg-webserver
image: registry.gitlab.com/gitlab-org/build/cng/gitlab-gkg:v0.17.0
env:
- name: GITLAB_GKG_MODE
value: "webserver"
- name: GITLAB_GKG_SECRET_PATH
value: "/.gitlab_shell_secret"
- name: GITLAB_GKG_DATA_DIR
value: "/data/gkg"
ports:
- containerPort: 3334
name: gkg-webserver
livenessProbe:
httpGet:
path: /health
port: 3334
readinessProbe:
httpGet:
path: /health
port: 3334
Updated Architecture Decisions
Authentication Architecture
-
✅ Current approach: Direct Rails → gkg-webserver communication with JWT authentication -
❌ Previous approach: Rails → gkg-proxy → gkg-webserver proxy (no longer used) -
Implementation: JWT authentication handled natively by the CNG image via
GITLAB_GKG_SECRET_PATH - Benefits: Simpler architecture, better performance, leverages built-in JWT support
Component Responsibilities
- gkg-proxy (renamed from zoekt-orchestrator): Manages indexing operations and communicates with GitLab Rails for indexing tasks
- gkg-webserver: Handles query requests directly from GitLab Rails via JWT authentication
- gkg-indexer: Processes indexing operations coordinated by gkg-proxy
- gkg-internal-gateway: Nginx gateway for internal routing and authentication
Environment Variables Validation (from GDK Implementation)
Based on the GDK MR !5334, the actual environment variables being used are:
✅ Confirmed Environment Variables for gkg-proxy:
# From GDK Procfile template:
GITLAB_ZOEKT_GKG_INDEX_URL=<socket_path_to_indexer>
GITLAB_ZOEKT_GKG_QUERY_URL=<socket_path_to_webserver>
For Kubernetes/Helm deployment, these should be:
# gkg-proxy environment variables
environment:
GITLAB_ZOEKT_GKG_INDEX_URL: "http://localhost:3333" # Points to gkg-indexer
GITLAB_ZOEKT_GKG_QUERY_URL: "http://localhost:3334" # Points to gkg-webserver
Note: The GDK implementation confirms that both GITLAB_ZOEKT_GKG_INDEX_URL and GITLAB_ZOEKT_GKG_QUERY_URL are still needed for the proxy, even though queries will eventually go directly to the webserver. This suggests a transitional architecture where the proxy still handles both flows initially.
Container Communication
- Indexing flow: Rails → gkg-proxy → gkg-indexer
- Query flow: Rails → gkg-webserver (direct with JWT) OR Rails → gkg-proxy → gkg-webserver (transitional)
- gkg-proxy runs in "indexer" mode with
GITLAB_ZOEKT_MODE=indexer - GKG-specific behavior controlled by
GITLAB_ZOEKT_GKG_INDEX_URLandGITLAB_ZOEKT_GKG_QUERY_URLenvironment variables - Heartbeat mechanism will indicate GKG nodes vs normal Zoekt nodes
Updated Configuration Options (values.yaml)
gkg:
enabled: false # Feature flag
replicas: 1
storage: 1Gi
storageClassName: ""
# Component-specific configurations
proxy: # Renamed from orchestrator
# Reuses indexer.image by default
resources: {}
environment:
GITLAB_ZOEKT_GKG_INDEX_URL: "http://localhost:3333"
GITLAB_ZOEKT_GKG_QUERY_URL: "http://localhost:3334" # Still needed for transitional architecture
googleCloudProfiler:
enabled: false
indexer:
image:
repository: registry.gitlab.com/gitlab-org/build/cng/gitlab-gkg
tag: v0.17.0
pullPolicy: IfNotPresent
listen:
port: 3333
healthPath: "/health"
resources: {}
securityContext: {}
environment:
GITLAB_GKG_MODE: "indexer"
GITLAB_GKG_SECRET_PATH: "/.gitlab_shell_secret"
GITLAB_GKG_DATA_DIR: "/data/gkg"
webserver:
image:
repository: registry.gitlab.com/gitlab-org/build/cng/gitlab-gkg
tag: v0.17.0
pullPolicy: IfNotPresent
listen:
port: 3334
healthPath: "/health"
resources: {}
securityContext: {}
environment:
GITLAB_GKG_MODE: "webserver"
GITLAB_GKG_SECRET_PATH: "/.gitlab_shell_secret"
GITLAB_GKG_DATA_DIR: "/data/gkg"
# JWT authentication configuration
jwt:
enabled: true
secretPath: "/.gitlab_shell_secret"
gateway:
# Inherits from global gateway config
tls:
certificate:
enabled: # defaults to global setting
Next Steps
-
✅ Container images available: CNG imagegitlab-gkg:v0.17.0ready for use -
✅ Environment variables confirmed: GDK implementation validates the required env vars - Update StatefulSet template: Replace placeholder images with actual CNG image
-
Update container names: Rename
zoekt-orchestratortogkg-proxyin templates - Update environment variables: Configure proper GKG-specific environment variables
-
Update health checks: Use
/healthendpoints on ports 3333/3334 - Implement JWT authentication: Configure direct Rails → gkg-webserver communication
- Testing: Validate the StatefulSet deployment with real GKG containers and JWT auth
- Documentation: Update helm chart documentation with GKG configuration options
- Production readiness: Add monitoring, logging, and operational considerations
Related Issues and MRs
- POC MR: gitlab-org/cloud-native/charts/gitlab-zoekt!126
- CNG Image MR: gitlab-org/build/CNG!2652 (closed)
- GDK Implementation MR: gitlab-org/gitlab-development-kit!5334 (merged)
- BasicAuth Removal MR: gitlab-org/cloud-native/charts/gitlab-zoekt!127
- Related to epic: GitLab Knowledge Graph Epic
- Mentioned in: gitlab-org/gitlab#568348 (closed)
- Closes: gitlab-org/rust/knowledge-graph#184 (closed)
Acceptance Criteria
-
StatefulSet template supports multi-container GKG deployment -
Updated: Use actual gitlab-gkgCNG image instead of placeholder images -
Updated: Rename zoekt-orchestratortogkg-proxyin all templates -
✅ Validated: Configure proper environment variables for dual-mode operation (GITLAB_ZOEKT_GKG_INDEX_URL,GITLAB_ZOEKT_GKG_QUERY_URL) -
Updated: Health checks use /healthendpoints on correct ports (3333/3334) -
Updated: Implement JWT authentication for direct Rails → gkg-webserver communication -
Persistent storage configured for GKG data ( /data/gkgvolume) -
Headless service enables pod-to-pod communication -
Configuration options allow customization of all GKG components -
Security features (TLS, JWT) are configurable -
Documentation updated with GKG deployment instructions -
Integration tested with actual GKG container images and JWT authentication