Skip to content

Add gitlab:zoekt:health rake task

What does this MR do and why?

Add comprehensive health check for Zoekt exact code search infrastructure

This MR introduces a new gitlab:zoekt:health rake task that provides administrators with comprehensive health validation of their Zoekt exact code search infrastructure. Currently, troubleshooting Zoekt issues requires manual investigation across multiple areas with no centralized health validation.

#567723 (closed)

Problem

Administrators need to:

  • Manually check node connectivity and status across multiple systems
  • Investigate configuration issues without clear validation
  • Troubleshoot connectivity problems with limited diagnostic tools
  • Validate JWT authentication and API endpoint functionality

Solution

The new health check provides comprehensive validation through three main sections:

1. Node Status Validation

  • Checks online/offline node status and duration since last contact
  • Validates storage utilization with watermark-based warnings
  • Provides actionable recommendations for detected issues

2. Configuration Validation

  • Verifies core Zoekt settings (indexing enabled, search enabled, paused status)
  • Validates namespace and repository indexing status and completion rates
  • Provides guidance for configuration improvements

3. Connectivity Testing

  • Tests JWT token generation and authentication
  • Validates network connectivity to each online node using real search requests
  • Measures response times and identifies unreachable nodes

Features

  • Colored output with clear status indicators (✓/⚠️/✗)
  • Actionable recommendations instead of just status observations
  • Watch mode support for continuous monitoring
  • Exit codes for automation integration (0=healthy, 1=degraded, 2=unhealthy)
  • Extensible architecture for adding future health checks
  • Reuses existing Zoekt client for accurate connectivity testing

Usage

# Single health check
bin/rails gitlab:zoekt:health

# Continuous monitoring (refreshes every 10 seconds)
bin/rails "gitlab:zoekt:health[10]"

Example Output

The health check provides clear, colored output showing:

  • Node Status section: Online/offline node counts, storage utilization warnings
  • Configuration section: Core settings validation, namespace/repository indexing status
  • Connectivity section: JWT token generation, node reachability testing, search API validation
  • Overall Status: Combined health assessment (HEALTHY, DEGRADED, or UNHEALTHY)
  • Recommendations: Actionable guidance for resolving detected issues

SCR-20250903-ncko

Architecture

  • Follows existing GitLab Zoekt patterns (InfoService, RakeTask structure)
  • Modular health check services for easy extension
  • Proper error handling and graceful degradation
  • Uses existing Zoekt client for realistic connectivity testing

How to set up and validate locally

  1. Set up test environment with Zoekt nodes:

    # Check current Zoekt status
    Search::Zoekt::Node.count
    ApplicationSetting.current.zoekt_indexing_enabled
  2. Run health check and verify output:

    # Basic health check
    bin/rails gitlab:zoekt:health
    
    # Watch mode
    bin/rails "gitlab:zoekt:health[5]"
  3. Test different scenarios:

    # Test with indexing disabled
    ApplicationSetting.current.update!(zoekt_indexing_enabled: false)
    bin/rails gitlab:zoekt:health  # Should show configuration warnings
    
    # Test with no enabled namespaces
    Search::Zoekt::EnabledNamespace.delete_all
    bin/rails gitlab:zoekt:health  # Should show namespace warnings
  4. Verify exit codes for automation:

    bin/rails gitlab:zoekt:health; echo "Exit code: $?"

MR acceptance checklist

This MR has been evaluated against the MR acceptance checklist.

Related to #567723 (closed)

Edited by Dmitry Gruzd

Merge request reports

Loading