Skip to content

Add a Zoekt SchedulingWorker

Ravi Kumar requested to merge 432693-create-a-zoekt-scheduling-worker into master

What does this MR do and why?

Create a worker Search::Zoekt::SchedulingWorker which performs some tasks. By default the task is set to :initiate which calls the worker for each SUPPORTED_TASKS. Right now we have only one supported task which is :node_assignment. This :node_assignment task iterates over each record of Search::Zoekt::EnabledNamespace which doesn't have a corresponding Search::Zoekt::Index record. It assigns each record of Search::Zoekt::EnabledNamespace to the Node sorted by descending order of free space. We are taking the buffer of 3x of the total repository size and the watermark limit of 80%. When the node can't be assigned to a namespace it adds an entry in the zoekt.log

This worker is a corn worker that will run after every 10 minutes

This feature is guided by this feature flag zoekt_scheduling_worker

Notes for database reviewers

SELECT
    "zoekt_enabled_namespaces".*
FROM
    "zoekt_enabled_namespaces"
    LEFT OUTER JOIN "zoekt_indices" ON "zoekt_indices"."zoekt_enabled_namespace_id" = "zoekt_enabled_namespaces"."id"
WHERE
    "zoekt_indices"."zoekt_enabled_namespace_id" IS NULL

Query plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/25195/commands/80040

SELECT
    "zoekt_nodes".*
FROM
    "zoekt_nodes"
ORDER BY
    total_bytes - used_bytes DESC

Query plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/25195/commands/80048

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

  1. Make sure the Zoekt is set up
  2. Make sure there are no pending DB migrations
bin/rails db:migrate
  1. Open the rails console
bin/rails c
  1. Run the following code and verify that it gives at least one record:
Search::Zoekt::EnabledNamespace.with_missing_indices
  1. If you don't get any records, then create some records like the following
Namespace.last(3).each { |n| Search::Zoekt::EnabledNamespace.create! root_namespace_id: n.root_ancestor }
  1. Now again verify with step 4 that you have 3 records
  2. Just for testing make the worker call synchronous
--- a/ee/app/workers/search/zoekt/scheduling_worker.rb
+++ b/ee/app/workers/search/zoekt/scheduling_worker.rb
@@ -37,7 +37,7 @@ def supported_tasks
       end
 
       def initiate
-        TASKS.each { |task| with_context(related_class: self.class) { self.class.perform_async(task) } }
+        TASKS.each { |task| with_context(related_class: self.class) { self.class.new.perform(task) } }
       end
  1. Tails the zoekt.log in a new terminal tab
tail -f log/zoekt.log`
  1. Enable the feature flag zoekt_scheduling_worker
Feature.enable(:zoekt_scheduling_worker)
  1. Now run the worker
Search::Zoekt::SchedulingWorker.new.perform
  1. Now verify that you get no records with step 4
  2. If you still get records with the above step. Check the log, there must be some entry
  3. If the entry message RootStorageStatistics is not available. You need to create RootStorageStatistics
  4. Run the following code
Namespace.last(3).each { |n| Namespace::RootStorageStatistics.find_or_create_by! namespace_id: n.root_ancestor.id }
  1. Rerun the worker from step 10.
  2. Now verify that you should not get any records with step 4

Related to #432693 (closed)

Edited by Ravi Kumar

Merge request reports