Incremental Rollout of Pages API based config source
C4
Production Change - Criticality 4Change Objective | Incrementally rollout the new Pages API based config source - start with list of predefined domains and then proceed with all other domains in batches. |
---|---|
Change Type | ConfigurationChange |
Services Impacted | GitLab-Pages |
Change Team Members | @vshushlin @krasio @grzesiek @aamarsanaa |
Change Severity | C4 |
Change Reviewer or tested in staging | Similar change was applied to staging: https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/8941 |
Dry-run output | If the change is done through a script, it is mandatory to have a dry-run capability in the script, run the change in dry-run mode and output the result |
Due Date | Date and time in UTC timezone for the execution of the change, if possible add the local timezone of the engineer executing the change |
Time tracking | To estimate and record times associated with changes ( including a possible rollback ) |
Precondition
-
Ensure feature pages_internal_api
is enabled on production-
To enable the feature run /chatops run feature set pages_internal_api true
on#production
Slack channel -
To confirm feature is enabled run /chatops run feature get pages_internal_api
on#production
Slack channel -
Cross post chatops slack command to #support_gitlab-com
and in#gitlab-pages
-
-
Ensure that on web-pages-01-sv-gprd
the path/var/opt/gitlab/gitlab-rails/shared/pages
is a mount endpoint onpages-01-stor-gprd
server -
Ensure that on api-01-sv-gprd
the path/var/opt/gitlab/gitlab-rails/shared/pages
is a mount endpoint onpages-01-stor-gprd
server -
Confirm that the content of /opt/gitlab/embedded/service/gitlab-rails/.gitlab_pages_secret
is the same across the web-pages-01-sv and api-01-sv hosts are the same
Detailed steps for the change
On Infra side
-
2020-02-18 07:30 (UTC) : Rollout to predefined list of domains -
ssh to pages-01-stor-gprd
-
sudo touch /var/opt/gitlab/gitlab-rails/shared/pages/.gitlab-source-config.yml -
vi /var/opt/gitlab/gitlab-rails/shared/pages/.gitlab-source-config.yml -
copy & paste the below file content, save & quit domains: enabled: - vshushlin.gitlab.io - shushlin.dev - pages-rollout.gitlab.io broken: non-existent-domain.gitlab.io
-
sudo chown git:git /var/opt/gitlab/gitlab-rails/shared/pages/.gitlab-source-config.yml
-
-
2020-02-18 11:00 (UTC) : Rollout to 5% of the domains -
ssh to pages-01-stor-gprd
-
vi /var/opt/gitlab/gitlab-rails/shared/pages/.gitlab-source-config.yml -
replace the file content with the content bellow, save & quit domains: enabled: - vshushlin.gitlab.io - shushlin.dev - pages-rollout.gitlab.io rollout: percentage: 5
-
sudo chown git:git /var/opt/gitlab/gitlab-rails/shared/pages/.gitlab-source-config.yml
-
-
2020-02-19 9:30 (UTC) : Rollout to 20% of the domains Repeat previous steps but change 5
to20
-
2020-02-20 TBD (TBD) : Rollout to 50% of the domains Repeat previous steps but change 5
to50
-
2020-02-TBD TBD (TBD) : Rollout to 100% of the domains Repeat previous steps but change 5
to100
Monitoring / Validation
Visit https://vshushlin.gitlab.io, https://vshushlin.gitlab.io/gitlab-meetup-pages, https://shushlin.dev/, and http://pages-rollout.gitlab.io/ many times, you should see it in logs below
-
API endpoint logs -
Visualization of API endpoint request duration based on application logs -
Grafana dashboard for the API endpoint -
web-pages service overview -
API Service Overview -
(optional for this issue) Prometheus graph: - should increase every time you access shushlin.dev or any other domains - increase of 400s or 500s will indicate a bug -
(optional for this issue) Grafana dashboard: https://dashboards.gitlab.net/d/_IQB_rSmk/pages?orgId=1&refresh=1m&from=now-3h&to=now&var-worker=All -
CPU Graph across web-pages fleet: https://thanos-query.ops.gitlab.net/graph?g0.range_input=2d&g0.max_source_resolution=0s&g0.expr=instance%3Acpu_utilization%3Aratio_avg%7Bfqdn%3D~%22web-pages-.*%22%2C%20environment%3D%22gprd%22%7D&g0.tab=0 -
Memory Graph across web-pages fleet: https://thanos-query.ops.gitlab.net/graph?g0.range_input=2d&g0.max_source_resolution=0s&g0.expr=instance%3Amemory_utilization%3Aratio_avg%7Bfqdn%3D~%22web-pages.*%22%2C%20environment%3D%22gprd%22%7D&g0.tab=0
Rollback steps
-
sudo rm /var/opt/gitlab/gitlab-rails/shared/pages/.gitlab-source-config.yml
Changes checklist
-
Detailed steps and rollback steps have been filled prior to commencing work -
Person on-call has been informed prior to change being rolled out
Edited by Vladimir Shushlin