Add gitlab:zoekt:health rake task
What does this MR do and why?
Add comprehensive health check for Zoekt exact code search infrastructure
This MR introduces a new gitlab:zoekt:health
rake task that provides administrators with comprehensive health validation of their Zoekt exact code search infrastructure. Currently, troubleshooting Zoekt issues requires manual investigation across multiple areas with no centralized health validation.
Problem
Administrators need to:
- Manually check node connectivity and status across multiple systems
- Investigate configuration issues without clear validation
- Troubleshoot connectivity problems with limited diagnostic tools
- Validate JWT authentication and API endpoint functionality
Solution
The new health check provides comprehensive validation through three main sections:
1. Node Status Validation
- Checks online/offline node status and duration since last contact
- Validates storage utilization with watermark-based warnings
- Provides actionable recommendations for detected issues
2. Configuration Validation
- Verifies core Zoekt settings (indexing enabled, search enabled, paused status)
- Validates namespace and repository indexing status and completion rates
- Provides guidance for configuration improvements
3. Connectivity Testing
- Tests JWT token generation and authentication
- Validates network connectivity to each online node using real search requests
- Measures response times and identifies unreachable nodes
Features
-
Colored output with clear status indicators (✓/
⚠️ /✗) - Actionable recommendations instead of just status observations
- Watch mode support for continuous monitoring
- Exit codes for automation integration (0=healthy, 1=degraded, 2=unhealthy)
- Extensible architecture for adding future health checks
- Reuses existing Zoekt client for accurate connectivity testing
Usage
# Single health check
bin/rails gitlab:zoekt:health
# Continuous monitoring (refreshes every 10 seconds)
bin/rails "gitlab:zoekt:health[10]"
Example Output
The health check provides clear, colored output showing:
- Node Status section: Online/offline node counts, storage utilization warnings
- Configuration section: Core settings validation, namespace/repository indexing status
- Connectivity section: JWT token generation, node reachability testing, search API validation
-
Overall Status: Combined health assessment (
HEALTHY
,DEGRADED
, orUNHEALTHY
) - Recommendations: Actionable guidance for resolving detected issues
Architecture
- Follows existing GitLab Zoekt patterns (InfoService, RakeTask structure)
- Modular health check services for easy extension
- Proper error handling and graceful degradation
- Uses existing Zoekt client for realistic connectivity testing
How to set up and validate locally
-
Set up test environment with Zoekt nodes:
# Check current Zoekt status Search::Zoekt::Node.count ApplicationSetting.current.zoekt_indexing_enabled
-
Run health check and verify output:
# Basic health check bin/rails gitlab:zoekt:health # Watch mode bin/rails "gitlab:zoekt:health[5]"
-
Test different scenarios:
# Test with indexing disabled ApplicationSetting.current.update!(zoekt_indexing_enabled: false) bin/rails gitlab:zoekt:health # Should show configuration warnings # Test with no enabled namespaces Search::Zoekt::EnabledNamespace.delete_all bin/rails gitlab:zoekt:health # Should show namespace warnings
-
Verify exit codes for automation:
bin/rails gitlab:zoekt:health; echo "Exit code: $?"
MR acceptance checklist
This MR has been evaluated against the MR acceptance checklist.
Related to #567723 (closed)