Draft: Add GitLab Knowledge Graph (GKG) StatefulSet support
Summary
This MR implements GitLab Knowledge Graph (GKG) support in the gitlab-zoekt Helm chart. GKG provides AI-powered code understanding and semantic search capabilities, deployed as a multi-container StatefulSet alongside the existing Zoekt deployment.
Implementation Details
Architecture
The GKG StatefulSet consists of 4 containers working together:
-
gkg-proxy (renamed from zoekt-orchestrator)
- Manages indexing operations
- Communicates with GitLab Rails for indexing tasks
- Uses the standard
gitlab-zoektimage - Environment:
GITLAB_ZOEKT_GKG_INDEX_URL,GITLAB_ZOEKT_GKG_QUERY_URL
-
gkg-internal-gateway
- Nginx-based gateway for internal routing
- Supports TLS and authentication
- Inherits global gateway configuration
-
gkg-indexer
- Handles GKG indexing operations
- Uses
gitlab-gkgCNG image - Listens on port 3333
- Environment:
GITLAB_GKG_MODE=indexer
-
gkg-webserver
- Serves GKG query requests
- Uses
gitlab-gkgCNG image - Listens on port 3334
- Supports direct JWT authentication from Rails
- Environment:
GITLAB_GKG_MODE=webserver
Key Features
-
✅ Persistent Storage: UsesvolumeClaimTemplatesfor GKG data (default 1Gi) -
✅ Service Discovery: Headless service enables StatefulSet pod communication -
✅ Security: JWT authentication, TLS support, configurable security contexts -
✅ Configuration: Extensive customization options throughvalues.yaml -
✅ Health Checks: Liveness/readiness probes for all containers -
✅ Resource Management: Configurable resource limits/requests per container -
✅ Inheritance: Falls back to global chart settings for common configurations -
✅ Certificate Support: Integrates with GitLab certificate management
Files Added/Modified
New Templates:
-
templates/statefulset-gkg.yaml- Multi-container StatefulSet definition -
templates/svc-gkg.yaml- Headless service for StatefulSet
Modified Templates:
-
templates/_helpers.tpl- Added GKG helper templates (fullname, labels, selectors)
Modified Values:
-
values.yaml- Added comprehensivegkg.*configuration section
Configuration
Basic Usage
Enable GKG in your values.yaml:
gkg:
enabled: true
replicas: 1
storage: 1Gi
Advanced Configuration
gkg:
enabled: true
replicas: 2
storage: 10Gi
storageClassName: "fast-ssd"
# Proxy configuration (manages indexing)
proxy:
resources:
requests:
cpu: 500m
memory: 1Gi
environment:
GITLAB_ZOEKT_GKG_INDEX_URL: "http://localhost:3333"
GITLAB_ZOEKT_GKG_QUERY_URL: "http://localhost:3334"
# Indexer configuration
indexer:
image:
repository: registry.gitlab.com/gitlab-org/build/cng/gitlab-gkg
tag: add-gitlab-gkg-image
resources:
requests:
cpu: 1
memory: 2Gi
# Webserver configuration
webserver:
image:
repository: registry.gitlab.com/gitlab-org/build/cng/gitlab-gkg
tag: add-gitlab-gkg-image
resources:
requests:
cpu: 500m
memory: 1Gi
# Gateway configuration (inherits from global)
gateway:
tls:
certificate:
enabled: true
secretName: gkg-tls-cert
Testing
Tested successfully on Minikube with the following verification:
# Install with GKG enabled
helm install gitlab-zoekt . \
--set gkg.enabled=true \
--set indexer.internalApi.secretName=gitlab-zoekt-internal-api \
--set indexer.internalApi.secretKey=.gitlab_shell_secret \
--set indexer.internalApi.gitlabUrl=http://gitlab.example.com
# Verify deployment
kubectl get statefulset gitlab-zoekt-gkg
kubectl get pods -l app.kubernetes.io/component=gkg
# Check all containers are running
kubectl get pod gitlab-zoekt-gkg-0
# Output: 4/4 Running
# Test health endpoints
kubectl exec gitlab-zoekt-gkg-0 -c gkg-proxy -- curl -s http://localhost:3333/health
# Output: {"status":"OK"}
kubectl exec gitlab-zoekt-gkg-0 -c gkg-proxy -- curl -s http://localhost:3334/health
# Output: {"status":"OK"}
Verification Results
-
✅ All 4 containers start successfully -
✅ Health checks pass for indexer and webserver -
✅ Persistent volume claims created correctly -
✅ Environment variables properly configured -
✅ JWT secret mounted correctly -
✅ Processes running with correct arguments:/bin/gkg-server-deployed -m indexer --bind 0.0.0.0:3333 --secret-path /.gitlab_shell_secret --data-dir /data/gkg /bin/gkg-server-deployed -m webserver --bind 0.0.0.0:3334 --secret-path /.gitlab_shell_secret --data-dir /data/gkg
Environment Variables
Validated from GDK Implementation
The environment variables used in this implementation have been validated against the GDK implementation (gitlab-org/gitlab-development-kit!5334 (merged)):
gkg-proxy:
-
GITLAB_ZOEKT_GKG_INDEX_URL- Points to gkg-indexer (http://localhost:3333) -
GITLAB_ZOEKT_GKG_QUERY_URL- Points to gkg-webserver (http://localhost:3334) -
GITLAB_ZOEKT_MODE=indexer- Runs in indexer mode with GKG support
gkg-indexer & gkg-webserver:
-
GITLAB_GKG_MODE- Operating mode (indexer/webserver) -
GITLAB_GKG_SECRET_PATH- JWT secret file path (/.gitlab_shell_secret) -
GITLAB_GKG_DATA_DIR- Data directory (/data/gkg)
Related Work
- POC: gitlab-org/cloud-native/charts/gitlab-zoekt!126
- CNG Image: gitlab-org/build/CNG!2652 (closed)
- GDK Implementation: gitlab-org/gitlab-development-kit!5334 (merged)
- Epic: GitLab Knowledge Graph Epic
- Related Issues: gitlab-org/gitlab#568348 (closed), gitlab-org/rust/knowledge-graph#184
Architecture Decisions
Direct Rails → gkg-webserver Communication
The current implementation supports direct communication from GitLab Rails to the gkg-webserver with JWT authentication, following the updated architecture that leverages the built-in JWT support in the CNG image.
Container Naming
Renamed zoekt-orchestrator to gkg-proxy to better reflect its role in the GKG architecture as a proxy/coordinator for GKG operations.
Image Strategy
-
gkg-proxy: Reuses the standard
gitlab-zoektimage (indexer mode) -
gkg-indexer/webserver: Uses dedicated
gitlab-gkgCNG image with dual-mode support
Migration Path
Since GKG is a new feature, there is no migration required. The feature is disabled by default (gkg.enabled: false) and can be enabled when ready.
Checklist
-
StatefulSet template supports multi-container GKG deployment -
Uses actual gitlab-gkgCNG image -
Renamed zoekt-orchestratortogkg-proxy -
Configured validated environment variables -
Health checks use /healthendpoints on correct ports -
JWT authentication configured for webserver -
Persistent storage configured for GKG data -
Headless service enables pod-to-pod communication -
Configuration options allow customization of all components -
Security features (TLS, JWT) are configurable -
Tested with actual GKG container images -
All containers running and healthy in test deployment