Skip to content

Backfill Epics into elasticsearch

Madelein van Niekerk requested to merge 250699-backfill-all-epics into master

What does this MR do and why?

Context

Currently all epic searches are doing a Basic Search and we want to allow Advanced Search to be used when Elasticsearch is available for faster and better searching.

To achieve this, we need the following:

Details

Adds an Elastic migration to backfill Epics.

  • If elastic namespace limiting is disabled: indexes all epics
  • If elastic namespace limiting is enabled: indexes epics belonging to groups that are indexed

NOTE: Before this migration is merged, elastic_index_epics feature flag has to be fully enabled.

Estimated run time

For the total number of epic records in Gitlab.com, this migration will require 11 migration runs. However, namespace limiting is enabled so there will be less epics -> this migration will finish in less than 11 migration runs.

Database queries

When limiting is enabled (Gitlab.com has limiting enabled)

Getting batch of epics: SELECT "epics"."id" FROM "epics" WHERE (epics.id > 0) AND "epics"."id" >= 1 ORDER BY "epics"."id" ASC LIMIT 1 OFFSET 1000

41.248 ms https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/20627/commands/67679

Getting the last id: SELECT "epics".* FROM "epics" WHERE (epics.id > 0) AND "epics"."id" >= 1 ORDER BY "epics"."id" DESC LIMIT 1

10.122 ms https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/20627/commands/67680

For each unique group in the list of epics:

1: Find the group: SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND "namespaces"."id" = 9970 LIMIT 1

4.694 ms https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/20627/commands/67683

2: Check if the group or its descendants are in ElasticsearchIndexedNamespace: SELECT 1 AS one FROM "elasticsearch_indexed_namespaces" WHERE "elasticsearch_indexed_namespaces"."namespace_id" IN (9970, 9971, 9972)

0.695 ms https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/20627/commands/67684

Checking if there are any epics (run once per migration): SELECT 1 AS one FROM "epics" LIMIT 1

2.492 ms https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/20627/commands/67681

Finding the maximum Epic id (run once per migration): SELECT MAX("epics"."id") FROM "epics"

0.622 ms https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/20627/commands/67682

When limiting is not enabled

Getting batch of epics: SELECT "epics"."id" FROM "epics" WHERE (epics.id > 0) AND "epics"."id" >= 1 ORDER BY "epics"."id" ASC LIMIT 1 OFFSET 1000

41.248 ms https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/20627/commands/67679

Getting the last id: SELECT "epics".* FROM "epics" WHERE (epics.id > 0) AND "epics"."id" >= 1 ORDER BY "epics"."id" DESC LIMIT 1

10.122 ms https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/20627/commands/67680

Checking if there are any epics (run once per migration): SELECT 1 AS one FROM "epics" LIMIT 1

2.492 ms https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/20627/commands/67681

Finding the maximum Epic id (run once per migration): SELECT MAX("epics"."id") FROM "epics"

0.622 ms https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/20627/commands/67682

Logs

MigrationWorker: migration[BackfillEpics] executing migrate method
[Elastic::Migration: 20230614090600] Indexing epics starting from id = 0
[Elastic::Migration: 20230614090600] Executing iteration 1 with last epic id: 48
[Elastic::Migration: 20230614090600] Setting migration_state to {\"max_processed_id\":48}
[Elastic::Migration: 20230614090600] Migration completed? max_processed_id(48); maximum_epic_id(48)
Elastic::MigrationWorker","message":"MigrationWorker: migration[BackfillEpics] updating with completed: true

If the BATCH_SIZE is 10 and ITERATIONS_PER_RUN is 2 as an example:

MigrationWorker: migration[BackfillEpics] executing migrate method
BackfillEpics","message":"[Elastic::Migration: 20230614090600] Indexing epics starting from id = 0
[Elastic::Migration: 20230614090600] Executing iteration 1 with last epic id: 10
[Elastic::Migration: 20230614090600] Executing iteration 2 with last epic id: 20
[Elastic::Migration: 20230614090600] Setting migration_state to {\"max_processed_id\":20}
[Elastic::Migration: 20230614090600] Migration completed? max_processed_id(20); maximum_epic_id(48)
MigrationWorker: migration[BackfillEpics] updating with completed: false
MigrationWorker: migration[BackfillEpics] kicking off next migration batch
MigrationWorker: migration[BackfillEpics] executing migrate method
[Elastic::Migration: 20230614090600] Indexing epics starting from id = 20
[Elastic::Migration: 20230614090600] Executing iteration 1 with last epic id: 30
[Elastic::Migration: 20230614090600] Executing iteration 2 with last epic id: 40
[Elastic::Migration: 20230614090600] Setting migration_state to {\"max_processed_id\":40}
[Elastic::Migration: 20230614090600] Migration completed? max_processed_id(40); maximum_epic_id(48)
MigrationWorker: migration[BackfillEpics] updating with completed: false
MigrationWorker: migration[BackfillEpics] kicking off next migration batch
MigrationWorker: migration[BackfillEpics] executing migrate method
[Elastic::Migration: 20230614090600] Indexing epics starting from id = 40
[Elastic::Migration: 20230614090600] Executing iteration 1 with last epic id: 48
[Elastic::Migration: 20230614090600] Setting migration_state to {\"max_processed_id\":48}
[Elastic::Migration: 20230614090600] Migration completed? max_processed_id(48); maximum_epic_id(48)
MigrationWorker: migration[BackfillEpics] updating with completed: true

How to set up and validate locally

  1. Disable elastic index limiting
  2. Execute the migration worker a few times: Elastic::MigrationWorker.new.perform
  3. Check that Epic.count records are enqueued: Elastic::ProcessBookkeepingService.queue_size
  4. Enable elastic index limiting and add a group containing epics
  5. Delete the migration record from elasticsearch: curl -X "DELETE" "http://localhost:9200/gitlab-development-migrations/_doc/20230614090600"
  6. Execute the migration worker a few times: Elastic::MigrationWorker.new.perform
  7. Check that the number of epics in the limited group is equal to the number of records enqueued: Elastic::ProcessBookkeepingService.queue_size

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #250699 (closed)

Edited by Madelein van Niekerk

Merge request reports