Skip to content

Fix Advanced Search commit migrations

Terri Chu requested to merge tchu-fix-commit-search-migrations into master

What does this MR do and why?

Related #344459 (closed)

History

The DeleteOrphanedCommit migration was originally created via !88197 (merged) to remove commits from the Elasticsearch index which are missing projects in the index to fix data that prevented the PopulateCommitPermissionsInMainIndex migration from finishing. However, the DeleteOrphanedCommit migration is not finishing in production and the number of records which come back for the migration continues to grow.

Kibana logs for Elastic::Migration 20220118150500

image

Through debugging some of the commits, I found that there are commits in the index which are missing the parent project (the project is missing from the index). I believe this is occurring due to other bugs which we do not have root cause or a fix for, but the steps I believe are happening to cause this:

  1. Project is somehow removed completely from the index (including all commits, blobs, issues, etc)
  2. A new commit is pushed to the project
  3. New commit is indexed (and has project visibility fields added)
  4. Project is still missing from the index
  5. Commit comes up as orphaned commit (but it's really not orphaned)

Fix

Change both of the commit migrations

  • DeleteOrphanedCommit to only remove orphaned commits (commits where the project is missing from the index) when the commit is missing the visibility_level field
  • PopulateCommitPermissionsInMainIndex to only populate commits which have a parent project
  • update specs for both migrations to make sure we cover the new query changes

This will work because any new commits will have the project visibility fields added when indexed (gitlab-elasticsearch-indexer!130 (merged)) so we can safely add the new guards to the Elasticsearch queries.

Screenshots or screen recordings

N/A

How to set up and validate locally

I'm not sure if I can provide steps to validate locally, it's pretty complex to setup all of the Elasticsearch data and manually manipulate it to create the scenarios we would need to test it. I'm relying upon the tests to make sure this works.

We've also run the new query for DeleteOrphanedCommit against production Elasicsearch instance to verify it comes back with zero records (which would allow the migration to finish and be marked as completed.)

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Terri Chu

Merge request reports