Resize sidekiq-pipeline nodes to custom-8-15

Production Change - Criticality 3 C3

Change Objective	More CPU for the pipeline nodes to reduce contention and increase queue performance. See https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/7294#note_196018603 for some discussion
Change Type	ConfigurationChange
Services Impacted	Pipeline scheduling (sidekiq)
Change Team Members	@cmiskell
Change Severity	C3
Buddy check or tested in staging	Not done in staging. Checker @ahanselka
Schedule of the change	2019-07-26 02:00 UTC for sidekiq-pipeline--0{4,5,6}, 2019-07-29 04:00 UTC
Duration of the change	15 minutes
Detailed steps for the change. Each step must include:	- pre-conditions for execution of the step, - execution commands for the step, - post-execution validation for the step , - rollback of the step

Stage 1:

Change n1-standard-4 to custom-8-15360 in the machine_types map of environments/gprd/variables.tf in terraform, for sidekiq-pipeline
Drain connections on the target node: knife ssh 'TARGETNODE' $'for pid in $(ps -ef|awk \'/sidekiq.*queues/ {print $2}\'|sort -u); do echo "Sending TSTP signal to ${pid}..."; sudo kill -TSTP $pid; done'
Shutdown
Resize to custom-8-15 by hand; the pipeline nodes in the sidekiq module do not have allow_stopping_for_update configured (the rest do). This needs fixing later
- gcloud compute instances set-machine-type sidekiq-pipeline-INDEX-sv-gprd --machine-type custom-8-15360 --zone us-east1-ZONEID --project gitlab-production
- gcloud compute instances start sidekiq-pipeline-INDEX-sv-gprd --zone us-east1-ZONEID --project gitlab-production
Repeat for instance [4] and [5], sequentially.
Plan: tf plan -target 'module.sidekiq.google_compute_instance.sidekiq_pipeline[3]' -target 'module.sidekiq.google_compute_instance.sidekiq_pipeline[4]' -target 'module.sidekiq.google_compute_instance.sidekiq_pipeline[5]' -out /tmp/tfplan and verify no changes are outstanding

Stage 2: After #963 (closed) is completed and looks stable, repeat on instances [0], [1], [2]. If we do these now, then when they restart chef will run, and sidekiq will be reconfigured and start processing from the old redis queue. Fixing that is possible, but fiddly, and I'd rather avoid the mess.

Edited Jul 26, 2019 by Craig Miskell