Skip to content

Enable Gitaly packfile cache for gitlab-com/www-gitlab-com

Production Change

Change Summary

Part of scalability#931.

In &372 (closed) we are developing a caching mechanism in Gitaly meant to reduce CPU/RAM consumption due to massively parallel CI Git fetch workloads. A first iteration of this cache has been deployed behind a feature flag. The next step is for us to investigate the real world impact of the cache, and to see if we need to iterate more on its design.

As discussed in scalability#931 we have turned this flag on for various projects already. We now want to turn it on for gitlab-com/www-gitlab-com.

Change Details

  1. Services Impacted - ServiceGit ServiceGitaly
  2. Change Technician - @jacobvosmaer-gitlab
  3. Change Criticality - C3,
  4. Change Type - changeunscheduled, changescheduled
  5. Change Reviewer - DRI for the review of this change
  6. Due Date - Date and time (in UTC) for the execution of the change
  7. Time tracking - Time, in minutes, needed to execute all change steps, including rollback
  8. Downtime Component - If there is a need for downtime, include downtime estimate here

Detailed steps for the change

Pre-Change Steps - steps to be completed before execution of the change

Estimated Time to Complete (mins) - 5

  • Ensure the cache is enabled in gprd. Both queries should return the same number.
  • Notify #whats-happening-at-gitlab that we are toggling a feature flag on gitlab-com/www-gitlab-com

Change Steps - steps to take to execute the change

Estimated Time to Complete (mins) - 20

  • /chatops run feature set gitaly_upload_pack_gitaly_hooks true --project gitlab-com/www-gitlab-com

Post-Change Steps - steps to take to verify the change

Estimated Time to Complete (mins) - 20

  • git clone --bare --depth=1 https://gitlab.com/gitlab-com/www-gitlab-com.git

Rollback

Rollback steps - steps to be taken in the event of a need to rollback this change

Estimated Time to Complete (mins) - 5

  • /chatops run feature delete gitaly_upload_pack_gitaly_hooks

Monitoring

Key metrics to observe

Summary of infrastructure changes

  • Does this change introduce new compute instances?
  • Does this change re-size any existing compute instances?
  • Does this change introduce any additional usage of tooling like Elastic Search, CDNs, Cloudflare, etc?

Summary of the above

Changes checklist

  • This issue has a criticality label (e.g. C1, C2, C3, C4) and a change-type label (e.g. changeunscheduled, changescheduled) based on the Change Management Criticalities.
  • This issue has the change technician as the assignee.
  • Pre-Change, Change, Post-Change, and Rollback steps and have been filled out and reviewed.
  • Necessary approvals have been completed based on the Change Management Workflow.
  • Change has been tested in staging and results noted in a comment on this issue.
  • A dry-run has been conducted and results noted in a comment on this issue.
  • SRE on-call has been informed prior to change being rolled out. (In #production channel, mention @sre-oncall and this issue and await their acknowledgement.)
  • There are currently no active incidents.
Edited by Jacob Vosmaer