Skip to content

Find orphaned keep-around refs

Problem to solve

We create a bunch of keep-around refs, from various sources. As features change and code evolves, we may end up with keep-around refs being created when they are not actually needed by the rails application anymore. When a keep-around ref is no longer needed, we say that it is orphaned, and call it an orphan.

The motivating example the merge head diff. We create keep-around refs for all merge request diffs, including those of type merge_head. But the merge head diff is a transient diff for the current point in time. A merge request has at most one merge head diff, and it gets replaced any time there is a change to the source branch or the target branch. The corresponding commit can still be legitimately kept around, but only if it is also the commit for a merged results pipeline, or if it has an associated note.

Proposal

Build tooling to identify and prune orphaned keep-around refs. In this issue, we focus on the identification step:

  1. Enumerate all places where keep-around refs are created, and where the corresponding commit is persisted in the database
  2. Use this list to construct a method that, given a list of keep-around refs, emits only the orphaned ones.
  3. In a follow-up: Add tooling to help reduce the risk of new keep-arounds from being created that are not known to this method, as that could lead to data loss.

This can then be used as part of an automated or manual workflow to prune orphaned keep-around refs:

  1. User or GitLab: Ensures there is an up-to-date backup of the repository
  2. User or GitLab: Provides an initial list of candidate keep-around refs
  3. This method: Selects the orphans
  4. User or GitLab: Reviews the output and prunes the orphans

Implementation details

To give a more concrete example, the final method could look like something this:

module Gitlab
  module Git
    class KeepAroundOrphanFinder
      include KeepAroundHelpers # extract from KeepAround

      def initialize(project)
        @project = project
        @repository = project.repository
      end
      
      def execute(shas)
        shas.each do |sha|
          raise ArgumentError, "#{sha} is not kept-around" unless kept_around?(sha)
        end
        
        # WARNING: For DEMONSTRATION ONLY. These queries MAY BE WRONG.
        shas_with_pipelines = @project.ci_pipelines.for_sha(sha).pluck(:sha)
        shas_with_notes = # ...
        shas_with_diffs = @project.merge_request_diffs.regular.by_head_commit_sha(sha).pluck(:head_commit_sha)
        # ...more ?

        shas - shas_with_pipelines - shas_with_notes - shas_with_diffs
      end
    end
  end
end

Background

Edited by Hordur Freyr Yngvason