Skip to content

Determine the state of patcher and confirm it's being regularly tested/documented

This issue is to try and understand what the current is the current state of patcher (https://ops.gitlab.net/gitlab-com/engineering/patcher). In particular, we wish to answer the following questions

  1. Is patcher currently functional? The last pipeline on master does not paint a clear picture https://ops.gitlab.net/gitlab-com/gl-infra/patcher/-/pipelines/1158941
  2. Is patcher being regularly tested? If so, when was the last test?
  3. Is the documentation for patcher up to date? In particular the understanding about what components it does and does not work on (is it exclusively for the rails monolith)?
  4. Is patcher even needed anymore? What scenarios does it help us with?
    • It fills a gap where we needed to fix GitLab.com immediately, and couldn't roll back because of post-deployment migrations. Now that PDM happen a lot later and after a deploy, our ability to rollback is greatly improved. Thus does this make patcher obsolete in this scenario?
    • We have a fix we need to get to GitLab.com and we need it immediately/as quickly as possible. Not able to leverage the normal auto-deploy process as it's considered too slow (and GitLab.com might be down)
  5. Patcher being in its own repo and not regularly touched has maintenance issues. For example when we rebuild or change our k8s or environment setup, we have to update patcher .gitlab-ci.yml to match. Seeing as patcher simply patches a docker image, then triggers k8s-workloads/gitlab-com, could we instead move its process to being more intrinsic to our existing tooling? E.g.
    • We know how to trigger dev.gitlab.org pipelines to build images for us, could we move the image patching part into the distribution toolchain?
    • Triggering k8s-workloads/gitlab-com could be a simple as manually running a pipeline on the ops mirror of k8s-workloads/gitlab-com with the following variables set GITLAB_IMAGE_TAG=$patched_tag which should start a regular gitlab-com pipeline targeting all environments to rollout the patched image
  6. Who has authority and the technical permissions (within our systems) to do a hotpatch? Delivery? EOCs? All SREs? What is the workflow and who makes the call to do one?

Any other thoughts or considerations around patcher we want to bring up and/or consider?

/cc @gitlab-org/delivery

Edited by John Skarbek