Skip to content

Draft: Make `BatchedGitRefUpdates` transactional

Kamil Trzciński requested to merge batched-refs-delete-can-rollback-delete into master

What does this MR do and why?

Make BatchedGitRefUpdates transactional

The currently implemented deletion mechanism of BatchedGitRefUpdates has a pretty likable race condition:

  1. If a ref is deleted (like when pipeline finished)
  2. Then is recreated (like when a build is retried)
  3. Worker would delete the ref (since we enqueue only deletes, but don't track creations).
  4. CI job will fail (since the ref was deleted by the worker).

Given that worker is run on interval of 1 minute by default this is very likeable to hit this problem.

This reduces the chance of race condition in the system to a time between executing delete_refs+update statements, which on most systems should be between 10ms-100ms (for very large repos).

This is achieved by doing the following:

  1. To reduce amount of deletes tracked, the unique constraint is added to project_id and ref.
  2. For the codepaths that use async deletion, a deletion record is updated with information with the create_sha request.
  3. If the record is transitioned back to pending state, it will be retried up until it can be marked as processed.

This still has a very small time when the race condition might be present or ref can be deleted, but the chance of this happening should be minimal:

  1. Temporarily deleted: if the is created between load_records and delete_and_recreate_records the reference might be temporarily deleted. However, the state will be properly restored in a next loop iteration, which should result the ref to be gone for roughly 10-100ms at most.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Kamil Trzciński

Merge request reports