Skip to content

enable elasticsearch integration on `gitlab.com` on `gitlab-org` namespace

Production Change - Criticality 2 C2

Change Objective enable elasticsearch integration on gitlab.com on gitlab-org namespace
Change Type terraform, gitlab config change through the admin panel, new elastic.co clusters
Services Impacted GCP firewall, sidekiq, postgres, gitaly, search feature in gitlab.com
Change Team Members @mwasilewski-gitlab and @nick.thomas
Change Severity C2
Buddy check or tested in staging the change was tested on staging environment: https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/6851
Schedule of the change 2019-06-24 09:10:00 UTC
Duration of the change ~1h ( including a possible rollback )
Detailed steps for the change. Each step must include:

prep:

  • open firewall on console, MR there are no deny_all egress rules in gprd so the MR is no longer relevant
  • create 2 elk clusters as described in the blueprint

rollout:

  • put in elastic creds
  • notify on-call in the production channel on slack that the change is being rolled out
  • enable elastic integration in the gitlab admin panel, limit to groups, do not put in any groups (pay attention to this step, if you enable the integration and do not limit to specific namespaces, the entire instance will be indexed)
  • on a machine with access to elastic (console):
    • gitlab-rake gitlab:elastic:create_empty_index
    • gitlab-rake gitlab:elastic:clear_index_status
  • add gitlab-org namespace in the admin panel (this will start scheduling elastic jobs, keep an eye on sidekiq and gitaly)
  • confirm sidekiq jobs are processed without any errors
  • and index is growing
    • go to Kibana on the elastic.co cluster created for monitoring link and navigate to Monitoring -> Indices -> gitlab-production . Index rate != 0 and document count going up means the index is growing
  • ensure personal snippets are indexed
  • once the initial indexing is finished (index stopped growing) enable searching with elastic

rollback:

  • see runbooks for more details on this
  • disable elastic integration in the admin panel
  • destroy all ElasticsearchIndexedNamespace (this will effectively stop creation of new sidekiq jobs)
  • recreate index
  • clear indexing status

Possible problems/notes:

  • blueprint (source/handbook/engineering/infrastructure/blueprint/201904-indexing-with-elastic-one-namespaces/index.html.md): gitlab-com/www-gitlab-com!21352 (merged)
  • firewall on console/web/sidekiq (all indexers should run on sidekiq and there are no egress deny rules for sidekiq fleet, so in principal there should be no problem with firewall)
  • not enough db connections -> increase db_pool
  • gitaly overloaded -> limit number of concurrent sidekiq elastic jobs
  • sidekiq besteffort nodes flooded -> add more nodes to the fleet
  • if possible completely avoid rake tasks, do not trigger indexing jobs (let gitlab take care of it)
Edited by Michal Wasilewski