Spike: How to improve the long-term scalability of CancelRedundantPipelinesService
Summary - Why is this spike needed?
As @drew
noted, we've already gone after all the low-hanging fruit, and to fix this in a really future-proof way, we'll probably have to do some reimagining of how we keep track of running Pipelines in order to make querying them less onerous.
Timebox Expectations
5 days
Expected Outcomes
-
Determine a feasible path forward towards long-term scalability of CancelRedundantPipelineService -
Create at least one actionable issue to move us towards long-term scalability of CancelRedundantPipelineService -
Technical proposal added to at least one actionable issue (ideally the one to be done first) -
Weight added to the same issue(s)
Proposal
Author: @panoskanell
Pipelines That need to be cancelled
Workflow
Pipeline is created --> Worker Is Called (In the sequence) --> Worker query's all the pipelines for a specific project --> For each batch, cancel all the ids inside
Current Structure
1 big table where we need to find all the pipelines of a project, which makes it a killer process
New structure (Event Driven/ Reactive)
Only the cancellable pipelines of a project will be stored here
The design should be event-driven
When the status of a pipeline changes and it's no longer cancellable, it's remove from Redis. This way we will only need to store a minimal amount of data
{
'project_id:full_ref_1' => {
'latest' => 'latest_pipeline_id' #'For comparisons when pipeline statuses are updated'
#latest_pipeline_id comes written here and the previous latest is written below if it's still #running or pending
'pipeline_id_1' => {
# Not sure if we need something in here yet
},
'pipeline_id_2' => {
}
},
'project_id_2:full_ref_2' => {
}
}
When the pipeline's status is updated, update the relevant pipeline's status here too.
Cases
Pipeline is created
- Make this pipeline as the latest for the ref
- If the previous latest pipeline is running or pending, insert it as cancellable
Pipeline Status becomes Running or Pending (Ci::HasStatus::ACTIVE_STATUSES)
- Check if this pipeline is the latest, if not, add it to redis as cancellable (In case it's active but a new pipeline is created)
Pipeline Status becomes anything else
- Check if the pipeline exists on Redis, if it exists, remove it from cancellable
Pipeline is deleted
- If the pipeline was active or it was the latest, check and delete from Redis
IMPORTANT NOTES:
- There's no specific order in which we need to cancel the pipelines as long as we cancel every
running
andpending
pipeline which isn't the latest (for each ref) - We only cancel pipelines from the current project, any pipelines belonging to another project are ignored
Pros
- Almost O(1) time complexity in figuring out which pipelines need to be cancelled per ref, since there won't be soo many pipelines per ref
- Space efficiency due to the event-driven/reactive approach for the storage
- The structure could also use flattened keys to reduce nesting e.g.
project_id & ref_id
Cons
- A lot of operations in Redis, though it's O(1) complexity for each operation.
- Not sure how Redis scales with larger data and operation volumes, we will need to run benchmarks.
- Cancelling the pipelines requires fetching by ID which is still an expensive operation but definitily not as expensive as figuring out which pipelines need to be cancelled
Pending Cases
-
Redis goes down and/or loses the cached data, we need a data sanitization mechanism if we don't use persistence and a fallback mechanism (could be the existing implementation)
-
Fine tuning per commit (https://docs.gitlab.com/ee/ci/yaml/#workflowauto_cancelon_new_commit). This could still be handled by the existing case
-
When the latest pipeline is deleted, do we set the previous latest pipeline as the latest one?
NOTE: This proposal is not complete and is subject to changes. It was added here to give a picture of the idea to other engineers.