Skip to content

Draft: Send the unique project_id task to the Zoekt indexer in the tasks API

What does this MR do and why?

Add the logic to send the tasks unique by project_identifier. This approach will allow us to utilize the maximum achievable concurrency by the zoekt indexer.

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

  1. Enable the feature flag zoekt_create_indexing_tasks.
Feature.enable(:zoekt_create_indexing_tasks)
  1. Enable Zoekt for a project
  2. Add some commits to this project
  3. Create a new project in the zoekt enabled namespace
  4. Add some commits to this new project
  5. Open the rails console
  6. Make sure there is some Search::zoekt::Task.
  7. Run the following commands in the rails console.
a = []
Search::Zoekt::Task.each_task(limit: 10) { |t| a << t }
  1. Make sure all the tasks in the a have the unique project_identifier

Query plan:

https://console.postgres.ai/shared/a8aaed63-7b5e-42f5-b548-4b8084250a01

SELECT
    "zoekt_tasks"."id",
    "zoekt_tasks"."zoekt_node_id",
    "zoekt_tasks"."zoekt_repository_id",
    "zoekt_tasks"."project_identifier",
    "zoekt_tasks"."perform_at",
    "zoekt_tasks"."created_at",
    "zoekt_tasks"."updated_at",
    "zoekt_tasks"."state",
    "zoekt_tasks"."task_type",
    "zoekt_tasks"."retries_left"
FROM
    "zoekt_tasks"
WHERE
    "zoekt_tasks"."zoekt_node_id" = 1
    AND "zoekt_tasks"."id" IN ( SELECT DISTINCT ON (project_identifier)
            id
        FROM
            "zoekt_tasks"
        WHERE
            "zoekt_tasks"."zoekt_node_id" = 1
            AND "zoekt_tasks"."perform_at" <= '2024-07-22 08:41:03.117117'
            AND "zoekt_tasks"."state" = 0
        ORDER BY
            "zoekt_tasks"."project_identifier" ASC,
            "zoekt_tasks"."perform_at" ASC
        LIMIT 1000)
ORDER BY
    "zoekt_tasks"."perform_at" ASC,
    "zoekt_tasks"."id" ASC
LIMIT 100

Related to #470600 (closed)

Edited by Ravi Kumar

Merge request reports