2022-06-03: Increase memory allocation for GitLab pages
Production Change
Change Summary
Increase memory allocation for GitLab pages
This is based on the current usage, pods seem to use between 4GB and 5GB and are pretty close to the limit of 6G pretty often.
This has been increasing as we're keeping cached archives of pages in memory for longer. The memory usage increases as traffic increases.
Change Details
- Services Impacted - ServicePages
-
Change Technician -
@reprazent -
Change Reviewer -
@skarbek - Time tracking - Time, in minutes, needed to execute all change steps, including rollback
- Downtime Component - If there is a need for downtime, include downtime estimate here
Detailed steps for the change
Change Steps - steps to take to execute the change
Estimated Time to Complete (mins) - Estimated Time to Complete in Minutes
-
Set label changein-progress /label ~change::in-progress -
Merge gitlab-com/gl-infra/k8s-workloads/gitlab-com!1841 (merged) -
Set label changecomplete /label ~change::complete
Rollback
Rollback steps - steps to be taken in the event of a need to rollback this change
Estimated Time to Complete (mins) - Estimated Time to Complete in Minutes
-
Revert gitlab-com/gl-infra/k8s-workloads/gitlab-com!1841 (merged) -
Set label changeaborted /label ~change::aborted
Monitoring
Key metrics to observe
- Metric: Saturation component
kube_container_memory
Change Reviewer checklist
-
Check if the following applies: - The scheduled day and time of execution of the change is appropriate.
- The change plan is technically accurate.
- The change plan includes estimated timing values based on previous testing.
- The change plan includes a viable rollback plan.
- The specified metrics/monitoring dashboards provide sufficient visibility for the change.
Change Technician checklist
-
Check if all items below are complete: - The change plan is technically accurate.
- This Change Issue is linked to the appropriate Issue and/or Epic
- Change has been tested in staging and results noted in a comment on this issue.
- A dry-run has been conducted and results noted in a comment on this issue.
- For C1 and C2 change issues, the SRE on-call has been informed prior to change being rolled out. (In #production channel, mention
@sre-oncalland this issue and await their acknowledgement.) - Release managers have been informed (If needed! Cases include DB change) prior to change being rolled out. (In #production channel, mention
@release-managersand this issue and await their acknowledgment.) - There are currently no active incidents that are severity1 or severity2
- If the change involves doing maintenance on a database host, an appropriate silence targeting the host(s) should be added for the duration of the change.