[Feature flag] Rollout of `dynamic_image_resizing_requester`/`dynamic_image_resizing_owner`
What
Rollout the :dynamic_image_resizing_owner feature flag, gradual rollout of dynamic_image_resizing_requester FFs. Monitoring of the WH health. FFs will not be removed in this issue, as disabling them is the most trivial way to rollback the feature
This feature is experimental and might not go live for everyone in its current form.
Rollback (if needed)
/chatops run feature delete dynamic_image_resizing_requester
/chatops run feature delete dynamic_image_resizing_owner
Runbook: https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/web/workhorse-image-scaler-alerts.md
Owners
- Team: Memory
- Most appropriate slack channel to reach out to:
#g_memory - Best individual to reach out to: alipniagov
Expectations
What are we expecting to happen?
When increasing the rollout percentage to 100%, every .png/.jpg avatar will be resized by us before served to the user dynamically, if the target width is in the allow list. Otherwise, or in case of any issue, the original image should be served.
What might happen if this goes wrong?
Avatars might look off / be missing.
What can we monitor to detect problems with this?
Dashboards for the feature
https://log.gprd.gitlab.net/goto/32296e6cccdc1f5809bd8be7e33a8b1e
https://dashboards.gitlab.net/d/UqLuWbFGz/dynamic-image-resizing?orgId=1
WH and overall Web health
- Metric: GL Health
- Location: https://dashboards.gitlab.net/d/RZmbBr7mk/gitlab-triage?orgId=1&refresh=30s Hide charts
- What changes to this metric should prompt a rollback: Error rate growth, significant response time growth
- Metric: Workhorse/Full web component Health; WH CPU & Memory usage
- Location: https://dashboards.gitlab.net/d/web-main/web-overview?orgId=1 and https://dashboards.gitlab.net/d/general-service/general-service-platform-metrics?orgId=1 Hide charts
- What changes to this metric should prompt a rollback: Error rate growth, significant response time growth
Beta groups/projects
We run a separate test for all GL employees: gitlab-com/gl-infra/production#2692 (closed)
Roll Out Steps
-
Enable on staging -
Test on staging -
Ensure that documentation has been updated -
Enable on GitLab.com for individual groups/projects listed above and verify behaviour -
Coordinate a time to enable the flag with #productionand#g_deliveryon slack. -
Announce on the issue an estimated time this will be enabled on GitLab.com -
Enable on GitLab.com by running chatops command in #production -
Cross post chatops slack command to #support_gitlab-com(more guidance when this is necessary in the dev docs) and in your team channel -
Announce on the issue that the flag has been enabled -
Remove feature flag and add changelog entry -
After the flag removal is deployed, clean up the feature flag by running chatops command in #productionchannel