Skip to content

Add a migration to reindex commits to fix repository_access_level

What does this MR do and why?

In this MR we are reindexing all the commits docs to fix the repository_access_level. The following is the approach:

  • Aggregate the docs by the field rid which is missing schema_version and take the 100 project_ids. The missing schema_version docs are the target docs because all the docs which have schema_version are the new docs means they already have the correct value of repository_access_level.
  • Run update_by_query on these 100 projects in the batch of 10_000 docs

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

Make sure Elasticsearch is enabled

  1. Open the Rails console
bundle exec rails c
  1. Check commits on ES
curl -XGET "http://localhost:9200/gitlab-development-commits/_search" -H "kbn-xsrf: reporting" -H "Content-Type: application/json" -d'
{
  "query": {
    "bool": {
      "must_not": [
        {
          "exists": {
            "field": "schema_version"
          }
        }
      ]
    }
  }
}' | json_pp
  1. Run the following command
require_relative 'ee/elastic/migrate/20230628112233_reindex_commits_to_fix_permissions.rb' 
ReindexCommitsToFixPermissions.new(20230628112233).migrate
  1. Now check again the docs with the above command. Now you should not see any result.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Approximate time to completion

~ 8 days

[17] pry(main)> number_of_documents = 1_250_072_954
=> 1250072954
[18] pry(main)> batch_size = 10_000
=> 10000
[19] pry(main)> throttle_delay = 5.seconds
=> 5 seconds
[20] pry(main)> ((number_of_documents / batch_size) * throttle_delay / 86400.seconds)
=> 7

Query plan

There is an SQL query here https://gitlab.com/gitlab-org/gitlab/-/blob/caf3eae36a79e423a1ae841829a56c78793f32b7/ee/elastic/migrate/20230703112233_reindex_commits_to_fix_permissions.rb#L40

The maximum number of project_ids_to_work can be 100. Here is the query plan for this: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/20037/commands/65370

Related to #410777

Edited by Ravi Kumar

Merge request reports