Skip to content

Improve NTP detection using system defined one and fallback to the predefined otherwise in gitlab:geo:check

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Background

GitLab Geo requires synchronized clocks across all nodes to function properly. The gitlab:geo:check rake task includes a health check that verifies NTP connectivity by attempting to reach pool.ntp.org.

Problem

The current implementation only checks the hardcoded pool.ntp.org server (or a user-defined NTP_HOST environment variable), which causes false warnings in:

  • Firewalled environments that block external NTP traffic
  • Air-gapped installations with no internet access
  • Corporate environments using internal NTP servers

These environments may have properly synchronized clocks via internal NTP servers defined in system configuration files, but the health check fails because it doesn't detect or use these configured servers.

Proposal

Implement a cascading NTP server detection that checks multiple sources before reporting a warning:

  1. User-defined: Check NTP_HOST environment variable (already implemented)
  2. System-defined: Parse system NTP configuration files to find locally configured servers (new)
  3. Fallback: Use pool.ntp.org (already implemented)
  4. Improved warning: If all checks fail, display an enhanced message. Look at this example.

This approach will reduce false positives while maintaining the ability to detect genuine clock synchronization issues.

Implementation Guide

  • Modify ee/lib/geo/system_check/clocks_synchronization_check.rb to check for system-defined NTP implementations.

  • Full flow should be:

    • User-defined: Check NTP_HOST environment variable if set
    • System-defined: Check system NTP configs (provided below)
    • Fallback: Use pool.ntp.org
    • If all fail: Show improved warning message with JWT explanation
  • The 4 major implementations are ntpd, Chrony, systemd-timesyncd, and OpenNTPD; they cover practically all Unix-based operating systems.

  • The configuration files that should be looked for are:

    configs = [
        # Chrony configs (most common on modern Linux)
        ['/etc/chrony.conf', :chrony],
        ['/etc/chrony/chrony.conf', :chrony],
        ['/usr/local/etc/chrony.conf', :chrony],
        
        # systemd-timesyncd configs
        ['/etc/systemd/timesyncd.conf', :timesyncd],
        ['/etc/systemd/timesyncd.conf.d/*.conf', :timesyncd],
        ['/run/systemd/timesyncd.conf.d/*.conf', :timesyncd],
        
        # ntpd configs
        ['/etc/ntp.conf', :ntpd],
        ['/etc/inet/ntp.conf', :ntpd],
        ['/private/etc/ntp.conf', :ntpd],
        ['/usr/local/etc/ntp.conf', :ntpd],
        
        # OpenNTPD configs
        ['/etc/openntpd/ntpd.conf', :openntpd],
        ['/etc/ntpd.conf', :openntpd]
      ]
  • Each NTP implementation has a different structure so the server address will be in different fields.

    • For Chrony and ntpd you're looking for the fields server and pool.
    • For timesyncd, NTP and FallbackNTP in the [Time] section.
    • For OpenNTPD, the key is servers
  • Here's a starter code to help you extract the right fields:

    def parse_ntp_config(file_path, parser_type)
      content = File.read(file_path)
      servers = []
      
      case parser_type
      when :chrony
        # Extract from: server <hostname/ip> and pool <hostname/ip>
        content.each_line do |line|
          line = line.sub(/#.*$/, '').strip
          if line =~ /^(?:server|pool)\s+([^\s]+)/
            servers << $1
          end
        end
        
      when :timesyncd
        # Extract from: NTP= and FallbackNTP= (space-separated, in [Time] section)
        in_time_section = false
        content.each_line do |line|
          line = line.strip
          
          # Track if we're in [Time] section
          if line.start_with?('[')
            in_time_section = (line == '[Time]')
            next
          end
          
          next unless in_time_section
          
          # Extract space-separated servers
          if line =~ /^(?:NTP|FallbackNTP)\s*=\s*(.+)$/
            servers.concat($1.split(/\s+/))
          end
        end
        
      when :ntpd
        # Extract from: server <hostname/ip> and pool <hostname/ip>
        content.each_line do |line|
          line = line.sub(/#.*$/, '').strip
          if line =~ /^(?:server|pool)\s+([^\s]+)/
            server = $1
            # Skip hardware reference clocks
            next if server.start_with?('127.127.')
            servers << server
          end
        end
        
      when :openntpd
        # Extract from: server <hostname/ip> and servers <hostname/ip>
        content.each_line do |line|
          line = line.sub(/#.*$/, '').strip
          if line =~ /^servers?\s+([^\s]+)/
            servers << $1
          end
        end
      end
      
      # Filter out localhost and empty values
      servers.reject { |s| s.empty? || s == '127.0.0.1' || s == '::1' || s == 'localhost' }.uniq
    end
  • Parsing rules:

    • Strip comments (everything after #)
    • Skip localhost (127.0.0.1, ::1)
    • Extract just the hostname/IP, ignore options (iburst, prefer, etc.)
    • Stop at the first config file containing valid servers
Edited by 🤖 GitLab Bot 🤖