Skip to content

Add 10% of Bronze customers to Elasticsearch advanced global search rollout

Production Change - Criticality 3 C3

Change Objective Describe the objective of the change
Change Type Operation
Services Impacted Advanced Search (ES integration), Sidekiq, Redis, Gitaly, PostgreSQL, Elastic indexing cluster
Change Team Members @DylanGriffith
Change Severity C3
Change Reviewer or tested in staging This has been done before on production #1788 (closed)
Dry-run output -
Due Date 2020-04-09 05:22:12 UTC
Time tracking

Detailed steps for the change

Pre-check

namespaces projects repository size issues merge requests comments
Currently in index 310 31214 806 GB 405 K 606 K -1
Added to index 301 14010 715 GB 307 K 414 K -1

Roll out

  1. Re-run the size estimate to confirm it hasn't increased significantly since last time https://gitlab.com/gitlab-org/gitlab/-/issues/211756#script-to-estimate-size
  2. Mention in #support_gitlab-com: @support-dotcom We are expanding our roll-out of Elasticsearch to more bronze customers on GitLab.com. These customers may notice changes in their global searches. Please let us know if you need help investigating any related tickets. Follow progress at https://gitlab.com/gitlab-com/gl-infra/production/-/issues/1925. The most reliable way to know if a group has Elasticsearch enabled is to see the "Advanced search functionality is enabled" indicator at the top right of the search results page..
  3. Create silence on the alert for "The elastic_indexer queue, main stage, has a queue latency outside of SLO" at https://alerts.gprd.gitlab.net/#/silences/new with env="gprd" type="sidekiq" priority="elasticsearch"
  4. Check with SRE on call in #production: @sre-oncall I would like to roll out this change https://gitlab.com/gitlab-com/gl-infra/production/-/issues/1925. Please let me know if there are any ongoing incidents or any other reason to hold off for the time being. Please note this may trigger a high CPU alert for sidekiq workers but since Elasticsearch has a dedicated sidekiq fleet it should not impact any other workers. I have also created a relevant alert silence for sidekiq SLO since it will very likely backlog the queues which is fine https://alerts.gprd.gitlab.net/#/silences/11be0b54-5e8c-43e2-b76a-ead16dee8803 . The overall indexing will probably take around 26 hrs comparing to doing the same thing yesterday so I've made the silence last 36 hours to be safe. .
  5. note disk space and number of documents for the gitlab-production index
  6. login as admin, get a personal access token with API scope that expires tomorrow
  7. Invoke API to add percentage
    • curl -X PUT -H "Private-Token: $API_TOKEN" -i 'https://gitlab.com/api/v4/elasticsearch_indexed_namespaces/rollout?plan=bronze&percentage=15'
  8. Due to gitlab-org/gitlab#213777 (closed) we need to do via rails console with:
    • ElasticsearchIndexedNamespace.drop_limited_ids_cache!; ElasticNamespaceRolloutWorker.perform_async('bronze', 10, 'rollout'); ElasticsearchIndexedNamespace.drop_limited_ids_cache!
    • #1925 (comment 320731446)
  9. note start time: 2020-04-09 05:22:12 UTC and update Due Date in table
  10. Wait for namespaces to finish indexing
    1. Look at Sidekiq Queue Lengths per Queue
    2. Find correlation ID for original API request: XX
    3. Look for done jobs with that correlation_id. When you are finished there should be 3 jobs done (1xElasticIndexerWorker, 2xElasticCommitIndexerWorker) per project in the group.
  11. note end time: 2020-04-10 10:56 UTC
  12. note time taken: ~30 hr
  13. note increase in index size
  14. Check for [any failed projects for that correlation_id](https://log.gprd.gitlab.net/goto/b6f4877e7db12153019662a64e312791 and manually retry them if necessary
  15. Test out searching
In gitlab-production index Before After
Total 1.1 TB 2.1 TB
Documents 36.8 M 64.4 M

Monitoring

Key metrics to observe

Other metrics to observe

Rollback steps

Changes checklist

  • Detailed steps and rollback steps have been filled prior to commencing work
  • Person on-call has been informed prior to change being rolled out
Edited by Dylan Griffith