Skip to content

Zoekt: Add gitlab:zoekt:health rake task

Background

Currently, troubleshooting Zoekt search issues requires manual investigation across multiple areas. Users encounter generic "syntax error" messages in the UI when networking issues occur, and there's no centralized way to validate the health of the Zoekt search infrastructure. The existing gitlab:zoekt:info task provides status information but doesn't perform active health validation or connectivity tests.

Administrators need a comprehensive health check tool that can:

  • Quickly identify common configuration and connectivity issues
  • Validate communication between GitLab and Zoekt nodes
  • Provide actionable guidance for resolving detected problems

Proposal

Create a new gitlab:zoekt:health rake task that performs comprehensive health checks for the Zoekt search infrastructure.

Health Checks

1. Node Status Validation

  • Check if any Zoekt nodes are online
  • Identify offline nodes and duration since last contact
  • Validate node storage utilization and warn on high usage

2. Connectivity Testing

  • Send test search requests to each online node
  • Validate JWT token generation and authentication
  • Test network connectivity and response times
  • Verify load balancer configuration (if applicable)

3. Configuration Validation

  • Check if indexing is enabled when expected
  • Validate feature flag configurations
  • Verify namespace and repository indexing status

Example Output

=== Zoekt Health Check ===

Node Status: ✓ HEALTHY
 ✓ 2 of 14 nodes online
 ⚠ WARNING: 12 nodes offline (last seen: 2025-09-03 11:47:08 UTC)
 ⚠ WARNING: High storage usage on nodes: 97.93% used

Connectivity: ✗ ISSUES DETECTED
 ✓ Node zoekt-01.example.com (200ms response)
 ✗ Node zoekt-02.example.com (timeout after 5s)
 ✓ JWT token generation successful
 ✓ Search API endpoint reachable

Configuration: ✓ HEALTHY
 ✓ Indexing enabled
 ✓ 8 namespaces ready for indexing
 ✓ 12 repositories indexed

Recommendations:
 - Investigate offline nodes (12 detected)
 - Monitor storage usage - consider cleanup or expansion
 - Check network connectivity to zoekt-02.example.com

Overall Status: ⚠ DEGRADED - Some issues detected

Implementation Details

The rake task should:

  1. Leverage existing Zoekt integration code to perform actual search requests
  2. Include timeout handling for unresponsive nodes
  3. Provide different verbosity levels (--verbose flag for detailed output)
  4. Return appropriate exit codes for automation/monitoring integration
  5. Log detailed errors while showing user-friendly summaries