Index `gitlab-com` group with Elasticsearch on GitLab.com

Blockers:

Figure out if we need to deal with https://gitlab.com/gitlab-org/gitlab/issues/103325#note_261461370
Determine if we need to resize Elasticsearch cluster

Production Change - Criticality 3 C3

Change Objective	Describe the objective of the change
Change Type	ConfigurationChange
Services Impacted	GitLab.com
Change Team Members	Dylan Griffith
Change Severity	C3
Change Reviewer or tested in staging	staging
Dry-run output	N/A
Due Date	2019-01-07 08:02 UTC
Time tracking

Pre-check

Run the below steps on staging
Determine there is enough space left in Elasticsearch cluster ✅
1. We've only used 40GB and there is ~~440GB~~ 360GB left *see comment below
Compare number of projects in gitlab-com to gitlab-org to estimate rough comparison of how long indexing will take and also compare repo size across groups.
1. Details at #1499 (comment 262021990)
2. Some evidence in #800 (comment 174903937) to suggest this wasn't a big amount of data relative to gitlab-org when it was last looked into.

Detailed steps for the change

In `gitlab-production` index	Before	After
Total	38.8GB	55.8GB
Documents	3.3M	4.4M

Rollback steps

Remove gitlab-com from indexed namespaces

Changes checklist

Detailed steps and rollback steps have been filled prior to commencing work
Person on-call has been informed prior to change being rolled out

Monitoring

queue for ElasticIndexerWorker
queue for ElasticCommitIndexerWorker
Overall sidekiq queues
Search controller performance
Check "Waiting Client Connections per Pool"
- If we see this increase to say 500 and stay that way then we should be concerned and disable indexing at that point
Check Gitaly node saturation
- If any nodes on this graph are maxed out for a long period of time correlated with enabling this we should disable it. We should first confirm by shutting down ElasticCommitIndexerWorker that it will help and then stop if it's clearly correlated.
Check Redis memory
- If memory usage starts growing rapidly it might get OOM killed (which would be really bad because we would lose all other queued jobs, for example scheduled CI jobs).
- If it gets close to SLO levels, the rate of growth should be evaluated and the indexing should potentially be stopped.
- Runbook for managing (killing) running/enqueued sidekiq jobs: https://gitlab.com/gitlab-com/runbooks/blob/master/troubleshooting/large-sidekiq-queue.md
Other dashboards:

Edited Jan 07, 2020 by Dylan Griffith