Zoekt: Indices have missing repo records

Summary

We've encountered this problem in 2025-02-03: Zoekt indices have missing repo rec... (gitlab-com/gl-infra/production#19227 - closed), but it seems that it's still ongoing.

Steps to reproduce

Click to expand
class ProgressTracker # Refer to https://gitlab.com/dgruzd/progress_tracker#script for the latest version
  def initialize(total_count, log_delay_seconds: 30, prefix: nil, logger: nil)
    @i = 0; @tc = total_count; @delay = log_delay_seconds.to_i; @mp = prefix; @l = logger; @last_log_at = Time.at(0)
  end

  def iteration(num = 1)
    @i += num; (@last_log_at = Time.now; p = "#{@i}/#{@tc}"; output = "#{@mp}#{p.ljust(@tc.digits.length * 2 + 1)} (#{format "%05.2f", (@i/@tc.to_f) * 100}%)"; (@l ? @l.info(output) : puts(output))) if Time.now - @last_log_at > @delay || @i == @tc
  end
end

def execute
  arr = []
  replicas = Search::Zoekt::Replica.ready

  total_count = replicas.count
  progress_tracker = ProgressTracker.new(total_count, logger: Logger.new($stdout))

  replicas.find_each do |r|
    repo_count = r.indices.sum{|i| i.zoekt_repositories.count }

    projects_count = 0
    Namespace.where('traversal_ids[1] = ?', r.namespace_id).where(type: 'Project').each_batch do |project_namespaces|
      projects_count += Project.where(project_namespace_id: project_namespaces.select(:id)).count
    end

    progress_tracker.iteration

    next unless repo_count != projects_count

    arr << {
      replica_id: r.id,
      namespace_id: r.namespace_id,
      repo_count: repo_count,
      projects_count: projects_count
    }
  end

  arr
end

What is the current bug behavior?

Not all projects have associated Zoekt repository records.

What is the expected correct behavior?

All projects should have repository records created

Relevant logs and/or screenshots

Possible fixes

Edited by Dmitry Gruzd