Reindex and remove leftover merge_request documents from the main index
What does this MR do and why?
This MR looks for the merge_request documents in the main index. Call ProcessBookkeepingService.track! with leftover merge_requests from the main index. After this, a delete_by_query call will be fired to delete the merge_request documents from the main index.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
| Before | After |
|---|---|
How to set up and validate locally
- Make sure the elasticsearch is enabled in GDK.
- Open the rails console
bundle exec rails c
- Populate
merge_requestin the main index
def populate_merge_request_in_main_index!(mr)
client = ::Gitlab::Search::Client.new
index_name = 'gitlab-development'
client.index(index: index_name, routing: "project_#{mr.project_id}", id: "merge_request_#{mr.id}",
refresh: true, body: {
id: mr.id, iid: mr.iid, target_branch: mr.target_branch, source_branch: mr.source_branch, title: mr.title,
description: mr.description, state: mr.state, merge_status: mr.merge_status, project_id: mr.project_id,
source_project_id: mr.source_project_id, target_project_id: mr.target_project_id, author_id: mr.author_id,
created_at: mr.created_at.strftime('%Y-%m-%dT%H:%M:%S.%3NZ'), visibility_level: mr.project.visibility_level,
updated_at: mr.updated_at.strftime('%Y-%m-%dT%H:%M:%S.%3NZ'),
join_field: { name: 'merge_request', parent: "project_#{mr.project_id}" }, type: 'merge_request',
merge_requests_access_level: mr.project.merge_requests_access_level
}
)
end
MergeRequest.all.each { |n| populate_merge_request_in_main_index!(n) }
- Ensure there is at least one merge_request in the main index by running the following curl command in bash
curl -XGET "http://localhost:9200/gitlab-development/_count" -H "kbn-xsrf: reporting" -H "Content-Type: application/json" -d'
{
"query": {
"bool": {
"filter": [
{ "term": { "type": "merge_request" } }
]
}
}
}'
count should be greater than 0
- Now run the following command in the rails console
Elastic::DataMigrationService[20231005103449].send(:migration).migrate
- Run again the curl command and ensure the
countis0
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Run time
~82 minutes
[1] pry(main)> no_of_documents = 27014.0
=> 27014.0
[2] pry(main)> batch_size = 1000
=> 1000
[3] pry(main)> throttle_delay = 3.minute
=> 3 minutes
[4] pry(main)> (no_of_documents.to_f / batch_size) * throttle_delay
=> 81.042 minutes
Related to #424872 (closed)
Edited by Ravi Kumar