ActiveContext: check for ready repositories
What does this MR do and why?
Adds a new task for ActiveContext SchedulingService to check for repositories that should be marked as ready.
Repositories in embedding_indexing_in_progress state can be there a while since embeddings are set asyncronously. We therefore need a way to determine when the initial embeddings are finished. For this we use the queued items.
- Adds
initial_indexing_last_queued_itemfield torepository.metadata - During initial indexing, after enqueueing references for embedding generation, we set this to the last enqueued
id - Adds an event
MarkRepositoryAsReadyEventwhich runs every hour to check for repositories inembedding_indexing_in_progressstate- Does a search for the id and checks if the currently indexing embedding model fields are populated
- If yes, set the repository as ready
- Happens in batches
- Adds a feature flag
active_context_code_event_mark_repository_readyto roll out adding the event
I.e. when initial indexing kicks off, the state changes to embedding_indexing_in_progress and embeddings are added to the queue. If we come back in an hour and find that the last queued item at initial index time has all the embedding fields populated, we know initial indexing is done.
References
- [Index state tracking: Rollout] Mark repository... (#545941 - closed)
- https://docs.gitlab.com/development/event_store/#register-the-subscriber-to-the-event
How to set up and validate locally
- Enable indexing
- Enable the feature flag
Feature.enable(:active_context_code_event_mark_repository_ready) - Create a repository in pending state:
Ai::ActiveContext::Code::Repository.create!(project: Project.first, active_context_connection: Ai::ActiveContext::Connection.active, enabled_namespace: Ai::ActiveContext::Connection.active.enabled_namespaces.first) - Run the
index_repositorytask:Ai::ActiveContext::Code::SchedulingWorker.new.perform("index_repository") - Execute queues until there are no more queued items:
ActiveContext.execute_all_queues! - Run the
mark_repository_as_readytask:Ai::ActiveContext::Code::SchedulingWorker.new.perform("mark_repository_as_ready") - Check the repository record:
state = :ready,indexed_at is set
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Related to #545941 (closed)