Zoekt: Add gitlab:zoekt:health rake task
Background
Currently, troubleshooting Zoekt search issues requires manual investigation across multiple areas. Users encounter generic "syntax error" messages in the UI when networking issues occur, and there's no centralized way to validate the health of the Zoekt search infrastructure. The existing gitlab:zoekt:info
task provides status information but doesn't perform active health validation or connectivity tests.
Administrators need a comprehensive health check tool that can:
- Quickly identify common configuration and connectivity issues
- Validate communication between GitLab and Zoekt nodes
- Provide actionable guidance for resolving detected problems
Proposal
Create a new gitlab:zoekt:health
rake task that performs comprehensive health checks for the Zoekt search infrastructure.
Health Checks
1. Node Status Validation
- Check if any Zoekt nodes are online
- Identify offline nodes and duration since last contact
- Validate node storage utilization and warn on high usage
2. Connectivity Testing
- Send test search requests to each online node
- Validate JWT token generation and authentication
- Test network connectivity and response times
- Verify load balancer configuration (if applicable)
3. Configuration Validation
- Check if indexing is enabled when expected
- Validate feature flag configurations
- Verify namespace and repository indexing status
Example Output
=== Zoekt Health Check ===
Node Status: ✓ HEALTHY
✓ 2 of 14 nodes online
⚠ WARNING: 12 nodes offline (last seen: 2025-09-03 11:47:08 UTC)
⚠ WARNING: High storage usage on nodes: 97.93% used
Connectivity: ✗ ISSUES DETECTED
✓ Node zoekt-01.example.com (200ms response)
✗ Node zoekt-02.example.com (timeout after 5s)
✓ JWT token generation successful
✓ Search API endpoint reachable
Configuration: ✓ HEALTHY
✓ Indexing enabled
✓ 8 namespaces ready for indexing
✓ 12 repositories indexed
Recommendations:
- Investigate offline nodes (12 detected)
- Monitor storage usage - consider cleanup or expansion
- Check network connectivity to zoekt-02.example.com
Overall Status: ⚠ DEGRADED - Some issues detected
Implementation Details
The rake task should:
- Leverage existing Zoekt integration code to perform actual search requests
- Include timeout handling for unresponsive nodes
-
Provide different verbosity levels (
--verbose
flag for detailed output) - Return appropriate exit codes for automation/monitoring integration
- Log detailed errors while showing user-friendly summaries