Skip to content

fix: prevent data integrity issues in ResolveReindexing, add logs

What does this MR do and why?

This MR introduces updates to elastic.ResolveReindexing function

  • To ensure data integrity
    • deleteByQuery and updateByQuery waits for completion
    • deleteByQuery refreshes the index to ensure that the subsequent updateByQuery operates on the latest version of the index
  • For monitoring
    • log the number of deleted and updated documents with log_level=INFO
  • Other updates
    • add routing to the deleteByQuery and updateByQuery

References

Related issue: Address elastic.ResolveReindexing data integrit... (#175 - closed)

How to set up and validate locally

We are testing this against gitlab-org/gitlab to validate the changes around data integrity:

Setup

  1. Obtain a copy of the gitlab repo in your local GDK, and make sure it allows force-pushes (go to Settings -> Repository -> Branch Rules)

  2. Follow these setup and validation steps, making sure you are setting it up for your local gitlab project

  3. Run initial indexing on your local gitlab project, the key parameters being from_sha="", to_sha=<latest commit or "">, force_reindex=false

    Expand for example command
    make && \
    GITLAB_INDEXER_MODE=chunk \
    GITLAB_INDEXER_DEBUG_LOGGING=1 \
    ./bin/gitlab-elasticsearch-indexer \
    -adapter "elasticsearch" \
    -connection '{"url": ["http://localhost:9200"]}' \
    -options '{
      "timeout": "30m",
      "chunk_size": 1000,
      "gitaly_batch_size": 1000,
      "from_sha": "",
      "to_sha": "cffa80231d2b1b4ca0ee3f2c355fdb1b1560f140",
      "force_reindex": false,
      "project_id": 75,
      "partition_name": "gitlab_active_context_code",
      "partition_number": 0,
      "gitaly_config": {
        "address": "unix:/Users/pamartiaga/Code/gitlab-development-kit/praefect.socket",
        "storage": "default",
        "relative_path": "@hashed/f3/69/f369cb89fc627e668987007d121ed1eacdc01db9e28f8bb26f358b7d8c4f08ac.git",
        "project_path": "gitlab-duo/gitlab"
      }
    }'

Testing

  1. Create a new commit in your local gitlab repo, ensure that you delete some files and update some files

  2. Run the indexer with from_sha="", to_sha=<the last commit you created>, force_reindex=true

    Expand for example command
    make && \
    GITLAB_INDEXER_MODE=chunk \
    GITLAB_INDEXER_DEBUG_LOGGING=1 \
    ./bin/gitlab-elasticsearch-indexer \
    -adapter "elasticsearch" \
    -connection '{"url": ["http://localhost:9200"]}' \
    -options '{
      "timeout": "30m",
      "chunk_size": 1000,
      "gitaly_batch_size": 1000,
      "from_sha": "",
      "to_sha": "b1e3f635afbf82550b9dc7361f85bac340fbd4dd",
      "force_reindex": true,
      "project_id": 75,
      "partition_name": "gitlab_active_context_code",
      "partition_number": 0,
      "gitaly_config": {
        "address": "unix:/Users/pamartiaga/Code/gitlab-development-kit/praefect.socket",
        "storage": "default",
        "relative_path": "@hashed/f3/69/f369cb89fc627e668987007d121ed1eacdc01db9e28f8bb26f358b7d8c4f08ac.git",
        "project_path": "gitlab-duo/gitlab"
      }
    }'
  3. Check the resolve_reindexing logs. These should be the last logs to be outputted:

    {"time":"2025-09-19T10:00:51.280386+10:00","level":"DEBUG","msg":"resolve_reindexing refreshing index before delete"}
    {"time":"2025-09-19T10:00:51.438544+10:00","level":"DEBUG","msg":"resolve_reindexing purging files not in reindex"}
    
    # this shows info about the deleted documents
    {"time":"2025-09-19T10:00:51.476813+10:00","level":"INFO","msg":"resolve_reindexing deleted documents","batches":1,"total":5,"updated":0,"created":0,"deleted":5,"noops":0,"took":37,"timed_out":false}
    
    {"time":"2025-09-19T10:00:51.476853+10:00","level":"DEBUG","msg":"resolve_reindexing set all documents back to reindexing=false"}
    
    # this shows info about the updated documents
    # verify that `updated` is equal to `total`, ie all documents are updated
    {"time":"2025-09-19T10:01:17.820686+10:00","level":"INFO","msg":"resolve_reindexing updated documents","batches":411,"total":410729,"updated":410729,"created":0,"deleted":0,"noops":0,"took":26342,"timed_out":false}
Edited by Pam Artiaga

Merge request reports

Loading