Skip to content

Add more Bronze customers to Elasticsearch advanced global search rollout

Production Change - Criticality 3 C3

Change Objective Describe the objective of the change
Change Type Operation
Services Impacted Advanced Search (ES integration), Sidekiq, Redis, Gitaly, PostgreSQL, Elastic indexing cluster
Change Team Members @DylanGriffith @mwasilewski-gitlab @dgruzd
Change Severity C3
Change Reviewer or tested in staging #1788 (comment 308202435)
Dry-run output -
Due Date 2020-04-08 01:07 UTC
Time tracking

Detailed steps for the change

Pre-check

namespaces projects repository size issues merge requests comments
Currently in index 39 21880 459 GB 308 K 471 K -1
Added to index 271 9437 341 GB 95.8 K 133 K -1

Roll out

  1. Re-run the size estimate to confirm it hasn't increased significantly since last time https://gitlab.com/gitlab-org/gitlab/-/issues/211756#script-to-estimate-size
  2. Mention in #support_gitlab-com: @support-dotcom We are expanding our roll-out of Elasticsearch to more bronze customers on GitLab.com. These customers may notice changes in their global searches. Please let us know if you need help investigating any related tickets. Follow progress at https://gitlab.com/gitlab-com/gl-infra/production/-/issues/1788. The most reliable way to know if a group has Elasticsearch enabled is to see the "Advanced search functionality is enabled" indicator at the top right of the search results page..
  3. Create silence on the alert for "The elastic_indexer queue, main stage, has a queue latency outside of SLO" at https://alerts.gprd.gitlab.net/#/silences/new with env="gprd" type="sidekiq" priority="elasticsearch"
  4. Check with SRE on call in #production: @sre-oncall I would like to roll out this change https://gitlab.com/gitlab-com/gl-infra/production/-/issues/1788. Please let me know if there are any ongoing incidents or any other reason to hold off for the time being. Please note this may trigger a high CPU alert for sidekiq workers but since Elasticsearch has a dedicated sidekiq fleet it should not impact any other workers. I have also created a relevant alert silence for sidekiq SLO since it will very likely backlog the queues which is fine https://alerts.gprd.gitlab.net/#/silences/e3546204-a5df-42ef-aeca-91e7c4c44ceb .
  5. note disk space and number of documents for the gitlab-production index
  6. login as admin, get a personal access token with API scope that expires tomorrow
  7. Invoke API to add percentage
    • curl -X PUT -H "Private-Token: $API_TOKEN" -i 'https://gitlab.com/api/v4/elasticsearch_indexed_namespaces/rollout?plan=bronze&percentage=5'
  8. note start time: 2020-04-08 01:07 UTC and update Due Date in table
  9. Wait for namespaces to finish indexing
    1. Look at Sidekiq Queue Lengths per Queue
    2. Find correlation ID for original API request: EmYXO0oV8G5
    1. Look for done jobs with that correlation_id. When you are finished there should be 3 jobs done (1xElasticIndexerWorker, 2xElasticCommitIndexerWorker) per project in the group.
    2. #1788 (comment 319624268)
  10. note end time: 2020-03-08 16:12:54 UTC
  11. note time taken: 13h 5m
  12. note increase in index size
  13. Check for [any failed projects for that correlation_id](https://log.gprd.gitlab.net/goto/b6f4877e7db12153019662a64e312791 and manually retry them if necessary
  14. Test out searching
In gitlab-production index Before After
Total 581.8 GB 1.1 TB
Documents 24.8 M 36.8 M

Monitoring

Key metrics to observe

Other metrics to observe

Rollback steps

Changes checklist

  • Detailed steps and rollback steps have been filled prior to commencing work
  • Person on-call has been informed prior to change being rolled out
Edited by Dylan Griffith