Skip to content

Create a simple scheduling worker that would assign added namespaces to the existing nodes based on namespace_statistics

Problem to solve

Some Zoekt nodes may become out of memory when the repositories are not assigned evenly across the different nodes.

Proposal

We can write a simple cron scheduling worker to assign the repositories(namespaces) evenly across the different nodes.

  1. Iterate over each record Search::Zoekt::EnabledNamespace ordered by id which doesn't have a join record in Search::Zoekt::Index.
  2. Pick the Node most free storage.
  3. Check the storage requirements for the namespace. For example, if it's 100GiB, we take 300GiB (x3). The storage requirement can be found in the NamespaceStatistics.
  4. See if assigning this namespace to the node keeps the node under the watermark limit (80%) of storage utilization.
  5. If yes, then create a record for Search::Zoekt::Index
  6. If not make a log entry, and repeat the steps for the next namespace.

We can schedule this cron worker every 10 minutes. Implement the feature behind an ops feature flag.

Edited by Ravi Kumar