Handle force pushes in Code Indexer
What does this MR do and why?
While writing the ActiveContext Code Embeddings runbook, we realized that force-pushes are not yet handled in the the Embeddings Indexing pipeline. We eventually agreed that force-pushes should be handled as any reindex.
In this MR, we update the ActiveContext::Code::Indexer
to detect force-pushes then trigger a force_reindex
when running the gitlab-elasticsearch-indexer
.
Solution Summary
- Add a
reindexing
field to thegitlab_active_context_code
index (MR: !201428 (merged)) - Handle force reindexing in the Go Indexer (MR: gitlab-elasticsearch-indexer!704 (merged))
- In Rails, when there is a force-push, call the Go Indexer with options
from_sha=""
andforce_reindex=true
(this issue)
References
- Related Issue: [Code Embeddings] Handle force pushes (#560713 - closed)
- Go Indexer
force_reindex
issue: [Code indexing pipeline] Handle force_reindex (gitlab-elasticsearch-indexer#172 - closed) - Further discussions around handling force-pushes: gitlab-elasticsearch-indexer!698 (comment 2677215507)
- Similar/prior art for checking "unreachable commits":
Gitlab::Elastic::Indexer#purge_unreachable_commits?
- See !200930 (comment 2714181125) for explanation on how this is copied
- Similar/prior art for checking "unreachable commits":
Screenshots or screen recordings
N/A - see validation steps below
How to set up and validate locally
Setup test project
On your local GDK, create a test project and add files.
In this example, I've created the project gitlab-duo/force-push-test
with the following commits and files:
2 - Initial Indexing
-
Ensure that you have enabled indexing for your project:
Feature.enable(:active_context_code_index_project, Project.find(<id_of_selected_test_project>))
-
Follow the setup tasks in #550418 (comment 2610944159) up to the "Run
index_repository
task" step.Note: ensure that the namespace you are processing is the same namespace of your test project
-
Verify that the files are indexed in Elasticsearch correctly.
-
Verify that the
Ai::ActiveContext::Code::Repository
record has the correctlast_commit
, e.g.:Expand for record check
Ai::ActiveContext::Code::Repository.find_by(project_id: <id_of_test_project>) => <Ai::ActiveContext::Code::Repository:0x0000000100e5a640 id: 6, project_id: 79, connection_id: 1, enabled_namespace_id: 1, metadata: {"initial_indexing_last_queued_item"=>"ead4dd4a7d5fe2e032b7dbf5b4559853079ed286c6689d3a3bb67facb9eac1a7"}, last_commit: "3f039321fa0990b24e390d0f3d39c38fb6ce8fab", state: "embedding_indexing_in_progress", indexed_at: Thu, 28 Aug 2025 01:37:26.138089000 UTC +00:00, created_at: Thu, 28 Aug 2025 01:22:06.779727000 UTC +00:00, updated_at: Thu, 28 Aug 2025 01:37:26.139551000 UTC +00:00, initial_indexing_last_queued_item: "ead4dd4a7d5fe2e032b7dbf5b4559853079ed286c6689d3a3bb67facb9eac1a7", incremental_indexing_last_queued_item: nil, last_error: nil>
3 - Test Force Push
You can test the force push in 2 ways:
- by following the validation steps in the Incremental Indexing MR: !201128 (merged) - this will allow the relevant workers to handle the incremental updates once you have done the force push
- by manually running the
IncrementalIndexingService
once you have done the force push - we will go with this simpler test
-
Update the test project with force pushes (make sure to delete 1 file for testing)
-
Run the
IncrementalIndexingService
In the rails console, run:
r = Ai::ActiveContext::Code::Repository.find_by(project_id: <id_of_test_project>) Ai::ActiveContext::Code::IncrementalIndexingService.execute(r)
-
Check
active_context.log
to verify that rails is calling the Go Indexer with the expected parameters offrom_sha="", force_reindex=true
:# in the gitlab root directory > tail -f log/active_context.log | grep "Ai::ActiveContext::Code::Indexer" # latest log output should be {"severity":"INFO","time":"2025-08-28T02:12:27.124Z","class":"Ai::ActiveContext::Code::Indexer","message":"Start indexer","ai_active_context_code_repository_id":6,"project_id":79,"from_sha":"","to_sha":"eca9608a1ae36c701b36ef1aea74727b671061db","force_reindex":true} {"severity":"INFO","time":"2025-08-28T02:12:27.492Z","class":"Ai::ActiveContext::Code::Indexer","message":"Indexer successful","ai_active_context_code_repository_id":6,"project_id":79,"from_sha":"","to_sha":"eca9608a1ae36c701b36ef1aea74727b671061db","force_reindex":true,"status":0}
-
Verify that the files are indexed in Elasticsearch correctly
Note that the deleted file from the force-push should not have documents in the index
-
Verify that the
Ai::ActiveContext::Code::Repository
record has the correctlast_commit
, e.g.:Expand for record check
Ai::ActiveContext::Code::Repository.find_by(project_id: <id_of_test_project>) => <Ai::ActiveContext::Code::Repository:0x000000015fe3c0a0 id: 6, project_id: 79, connection_id: 1, enabled_namespace_id: 1, metadata: {"initial_indexing_last_queued_item"=>"ead4dd4a7d5fe2e032b7dbf5b4559853079ed286c6689d3a3bb67facb9eac1a7", "incremental_indexing_last_queued_item"=>"6cff0c6fcce7d9d3f7a31c693612d1abd18ed880e1106f23d25733ec9082a52c"}, last_commit: "eca9608a1ae36c701b36ef1aea74727b671061db", state: "ready", indexed_at: Thu, 28 Aug 2025 02:01:32.367499000 UTC +00:00, created_at: Thu, 28 Aug 2025 01:22:06.779727000 UTC +00:00, updated_at: Thu, 28 Aug 2025 02:01:32.367782000 UTC +00:00, initial_indexing_last_queued_item: "ead4dd4a7d5fe2e032b7dbf5b4559853079ed286c6689d3a3bb67facb9eac1a7", incremental_indexing_last_queued_item: "6cff0c6fcce7d9d3f7a31c693612d1abd18ed880e1106f23d25733ec9082a52c", last_error: nil>
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Related to #560713 (closed)