Incremental Rollout of Pages API based config source after "other" fixes
Production Change - Criticality 4 C4
| Change Objective | Incrementally rollout the new Pages API based config source - start with list of predefined domains and then proceed with all other domains in batches. | 
|---|---|
| Change Type | ConfigurationChange | 
| Services Impacted | GitLab-Pages | 
| Change Team Members | @vshushlin @krasio @grzesiek @aamarsanaa | 
| Change Severity | C4 | 
| Change Reviewer or tested in staging | Similar change was applied to staging: https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/8941 | 
| Dry-run output | If the change is done through a script, it is mandatory to have a dry-run capability in the script, run the change in dry-run mode and output the result | 
| Due Date | Date and time in UTC timezone for the execution of the change, if possible add the local timezone of the engineer executing the change | 
| Time tracking | To estimate and record times associated with changes ( including a possible rollback ) | 
Precondition
- 
Make sure pages with fixes are deployed to production:  
/chatops run auto_deploy status 05d583ae7560672d1ea78f9fb5fc76f95d1dbf52
- 
Make sure test domains are served succesfully(just open in browser): - vshushlin.gitlab.io
 - shushlin.dev
 - pages-rollout.gitlab.io
 
 
Detailed steps for the change
On Infra side
- 
2020-03-11 01:00 (UTC) : Rollout to 20% of the domains - 
sshtopages-01-stor-gprd - 
sudo vi /var/opt/gitlab/gitlab-rails/shared/pages/.gitlab-source-config.yml - 
replace the file content with the content bellow, save & quit domains: enabled: - vshushlin.gitlab.io - shushlin.dev - pages-rollout.gitlab.io rollout: percentage: 20 - 
sudo chown git:git /var/opt/gitlab/gitlab-rails/shared/pages/.gitlab-source-config.yml  
 - 
 - 
2020-03-12 04:34 (UTC) : Rollout to 50% of the domains Repeat previous steps but change 20to50 - 
2020-03-13 01:00 (UTC) : Rollout to 50% of the domains Repeat previous steps but change 50to100 
Monitoring / Validation
Visit https://vshushlin.gitlab.io, https://vshushlin.gitlab.io/gitlab-meetup-pages, https://shushlin.dev/, and http://pages-rollout.gitlab.io/ many times, you should see it in logs below
- 
API endpoint logs  - 
Visualization of API endpoint request duration based on application logs  - 
Grafana dashboard for the API endpoint  - 
web-pages service overview  - 
API Service Overview  - 
(optional for this issue) Prometheus graph: - should increase every time you access shushlin.dev or any other domains - increase of 400s or 500s will indicate a bug  - 
(optional for this issue) Grafana dashboard: https://dashboards.gitlab.net/d/_IQB_rSmk/pages?orgId=1&refresh=1m&from=now-3h&to=now&var-worker=All  - 
CPU Graph across web-pages fleet: https://thanos-query.ops.gitlab.net/graph?g0.range_input=2d&g0.max_source_resolution=0s&g0.expr=instance%3Acpu_utilization%3Aratio_avg%7Bfqdn%3D~%22web-pages-.*%22%2C%20environment%3D%22gprd%22%7D&g0.tab=0  - 
Memory Graph across web-pages fleet: https://thanos-query.ops.gitlab.net/graph?g0.range_input=2d&g0.max_source_resolution=0s&g0.expr=instance%3Amemory_utilization%3Aratio_avg%7Bfqdn%3D~%22web-pages.*%22%2C%20environment%3D%22gprd%22%7D&g0.tab=0  
Rollback steps
- 
sshtopages-01-stor-gprd - 
sudo vi /var/opt/gitlab/gitlab-rails/shared/pages/.gitlab-source-config.yml - 
replace the file content with the content bellow, save & quit domains: enabled: - vshushlin.gitlab.io - shushlin.dev - pages-rollout.gitlab.io 
Changes checklist
- 
Detailed steps and rollback steps have been filled prior to commencing work  - 
Person on-call has been informed prior to change being rolled out  
Edited  by Krasimir Angelov