Add migration for backfilling traversal_ids in blobs and wiki blobs
What does this MR do and why?
Describe in detail what your merge request does and why.
Backfills the traversal_ids for blobs and wiki blobs in the main index, more details in the issue #351381 (closed)
Time for completion estimate calculation (internal link): 278 hours (may take a little longer due to having to work through each project). Indexing will not be paused during the migration.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
- Run the following query to get the blobs and wiki_blobs with missing
traversal_ids
{
"size": 0,
"query": {
"bool": {
"must_not": {
"exists": {
"field": "traversal_ids"
}
},
"must": {
"terms": {
"type": [
"blob",
"wiki_blob"
]
}
}
}
},
"aggs": {
"my-agg-name": {
"terms": {
"size": 1000,
"field": "project_id"
}
}
}
}
- Make sure advanced search is enabled and you run the migration from rails console by entering the following lines:
require File.expand_path('ee/elastic/migrate/20221221110300_add_traversal_ids_in_blobs_and_wiki_blobs.rb')
BackfillTraversalIdsToBlobsAndWikiBlobs.new(20221221110300).migrate
- Run the query again in ES to verify that there are no records with misssing traversal_ids
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #351381 (closed)
Merge request reports
Activity
changed milestone to %15.8
assigned to @sdungarwal
added typebug label and removed typemaintenance label
- A deleted user
added backend label
- Resolved by John Mason
Reviewer roulette
Changes that require review have been detected!
Please refer to the table below for assigning reviewers and maintainers suggested by Danger in the specified category:
Category Reviewer Maintainer backend Joseph Joshua (
@joseph
) (UTC+0, 1 hour behind@sdungarwal
)Max Woolf (
@mwoolf
) (UTC+0, 1 hour behind@sdungarwal
)To spread load more evenly across eligible reviewers, Danger has picked a candidate for each review slot, based on their timezone. Feel free to override these selections if you think someone else would be better-suited or use the GitLab Review Workload Dashboard to find other available reviewers.
To read more on how to use the reviewer roulette, please take a look at the Engineering workflow and code review guidelines. Please consider assigning a reviewer or maintainer who is a domain expert in the area of the merge request.
Once you've decided who will review this merge request, assign them as a reviewer! Danger does not automatically notify them for you.
If needed, you can retry the
danger-review
job that generated this comment.Generated by
Danger
mentioned in issue #351381 (closed)
added 1 commit
- 09200e41 - Add specs and rename migration acc to class name
- Resolved by Siddharth Dungarwal
- Resolved by Siddharth Dungarwal
- Resolved by Siddharth Dungarwal
added 1 commit
- 588b166e - Add specs and rename migration acc to class name
added 1 commit
- e08e3eb4 - Add specs and rename migration acc to class name
- Resolved by Siddharth Dungarwal
marked the checklist item I have evaluated the MR acceptance checklist for this MR. as completed
- Resolved by John Mason
requested review from @imand3r
- Resolved by Siddharth Dungarwal
- Resolved by Siddharth Dungarwal
- Resolved by Siddharth Dungarwal
- Resolved by Siddharth Dungarwal
- Resolved by Siddharth Dungarwal
- Resolved by Siddharth Dungarwal
- Resolved by Siddharth Dungarwal
- Resolved by Siddharth Dungarwal
- Resolved by Siddharth Dungarwal
@sdungarwal I left a few questions.
added 2 commits
added 1 commit
- 840f5778 - Merge branch '351381-backfill-blobs-and-wiki-blobs' of...
@imand3r
, thanks for approving this merge request.This is the first time the merge request is approved. To ensure full test coverage, a new pipeline will be started shortly.
For more info, please refer to the following links:
added pipeline:mr-approved label
requested review from @terrichu
- Resolved by Siddharth Dungarwal
- Resolved by Siddharth Dungarwal
- Resolved by Siddharth Dungarwal
- Resolved by Siddharth Dungarwal
- Resolved by Siddharth Dungarwal
- Resolved by Siddharth Dungarwal
- Resolved by Siddharth Dungarwal
- Resolved by Siddharth Dungarwal
- Resolved by Siddharth Dungarwal
- Resolved by Siddharth Dungarwal
- Resolved by John Mason
@sdungarwal Left you some notes and suggestions to consider, back to you
removed review request for @terrichu
added maintenanceperformance typemaintenance labels and removed typebug label
removed bugperformance label
mentioned in merge request !108135 (merged)
mentioned in commit 193e2861
added 1 commit
- 193e2861 - Add migration for backfilling traversal_ids in a single project
- Resolved by Siddharth Dungarwal
- Resolved by Siddharth Dungarwal
- Resolved by Siddharth Dungarwal
- Resolved by Siddharth Dungarwal
thought (non-blocking): i was concerned about running the aggregations for such a large number of documents but I forgot that aggregations API by default only returns top 10. That makes me worry less about running the aggregation every minute. Also, I updated the description with the estimated time to complete.
@terrichu following up, the GitLab Ultimate customer in ZD 450353 reported they "had to pause indexing and … scale the cluster up [by] copying all the shards to the new cluster" because this migration maxed out their ElasticSearch ("CPU began pegging at near 100% on all of our data nodes, and the data nodes and master node were quickly running out of disk space").
The customer is wondering whether this MR should have been highlighted, according to https://about.gitlab.com/handbook/marketing/blog/release-posts/#important-notes-on-upgrading for example.
@katrinleinweber The batch size ended up being bumped down in two different MRs !112305 (merged) and !113719 (merged). I thought the bump down to
50_000
made it in %15.9 but I checked and see it's at100_000
in %15.9 and10_000
in %15.10The changes were due to load on the Elasticsearch cluster observed during the migrations
@m_lussier We could add a note to https://docs.gitlab.com/ee/update/versions/gitlab_15_changes.html#1590. opened !132173 (merged)
/cc @cleveland
Edited by Terri Chu
- Resolved by Siddharth Dungarwal
- Resolved by Siddharth Dungarwal
- Resolved by Siddharth Dungarwal
removed review request for @terrichu
- Resolved by Siddharth Dungarwal
requested review from @terrichu
mentioned in commit d1bce549
added 1 commit
- d1bce549 - Add migration for backfilling traversal_ids in a single project
mentioned in commit ac497c04
added 1 commit
- ac497c04 - Add migration for backfilling traversal_ids in a single project
assigned to @terrichu
- Resolved by John Mason
@john-mason Would you mind taking over the backend maintainer reviewer here? I am going to take over as a co-author for any changes required (plus do the monitoring in staging/production/kibana once it merges).
question: I think this should be ok with the runtime. WDYT?
thought (non-blocking):
elastic_migration_worker
is currently disabled in production so I can control when it starts
added workflowin review label and removed workflowin dev label
requested review from @john-mason and removed review request for @terrichu
- Resolved by Terri Chu
- Resolved by Terri Chu
- Resolved by John Mason
- Resolved by John Mason
added 1 commit
- 1d459168 - Apply maintainer suggestions and refactor specs a bit
- Resolved by John Mason
- Resolved by John Mason
- Resolved by John Mason
- Resolved by John Mason
enabled an automatic merge when the pipeline for 623ab965 succeeds
mentioned in commit 81aa23cb
mentioned in commit 90827cbb
added workflowverification label and removed workflowin review label
added workflowstaging-canary label and removed workflowverification label
added workflowcanary label and removed workflowstaging-canary label
added workflowstaging label and removed workflowcanary label
added workflowproduction label and removed workflowstaging label
added workflowpost-deploy-db-staging label and removed workflowproduction label
mentioned in merge request !109379 (merged)
mentioned in commit mehulsharma/gitlab@84a0fbde
Chatops says that
gitlab-org/gitlab!107730 has not been included in the stable branch. The MR will not be released in 15.8.
So I'll update the milestone to %15.9
changed milestone to %15.9
mentioned in merge request !109706 (merged)
added releasedcandidate label
mentioned in merge request kubitus-project/kubitus-installer!1922 (merged)
added releasedpublished label and removed releasedcandidate label
added customer label
mentioned in merge request !132173 (merged)
mentioned in issue gitlab-com/www-gitlab-com#13970 (closed)