Zoekt: Indices have missing repo records
Summary
We've encountered this problem in 2025-02-03: Zoekt indices have missing repo rec... (gitlab-com/gl-infra/production#19227 - closed), but it seems that it's still ongoing.
Steps to reproduce
Click to expand
class ProgressTracker # Refer to https://gitlab.com/dgruzd/progress_tracker#script for the latest version
def initialize(total_count, log_delay_seconds: 30, prefix: nil, logger: nil)
@i = 0; @tc = total_count; @delay = log_delay_seconds.to_i; @mp = prefix; @l = logger; @last_log_at = Time.at(0)
end
def iteration(num = 1)
@i += num; (@last_log_at = Time.now; p = "#{@i}/#{@tc}"; output = "#{@mp}#{p.ljust(@tc.digits.length * 2 + 1)} (#{format "%05.2f", (@i/@tc.to_f) * 100}%)"; (@l ? @l.info(output) : puts(output))) if Time.now - @last_log_at > @delay || @i == @tc
end
end
def execute
arr = []
replicas = Search::Zoekt::Replica.ready
total_count = replicas.count
progress_tracker = ProgressTracker.new(total_count, logger: Logger.new($stdout))
replicas.find_each do |r|
repo_count = r.indices.sum{|i| i.zoekt_repositories.count }
projects_count = 0
Namespace.where('traversal_ids[1] = ?', r.namespace_id).where(type: 'Project').each_batch do |project_namespaces|
projects_count += Project.where(project_namespace_id: project_namespaces.select(:id)).count
end
progress_tracker.iteration
next unless repo_count != projects_count
arr << {
replica_id: r.id,
namespace_id: r.namespace_id,
repo_count: repo_count,
projects_count: projects_count
}
end
arr
end
What is the current bug behavior?
Not all projects have associated Zoekt repository records.
What is the expected correct behavior?
All projects should have repository records created
Relevant logs and/or screenshots
Possible fixes
Edited by Dmitry Gruzd