Next CI/CD queuing architecture
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Description
In 2021 we've delivered next iteration of how we build a queue of builds that GitLab will later run using Runners. We referred to that iteration as "accelerated queuing tables" because it heavily used a data denormalization technique to decouple queuing from a tangled database schema and relational references between numerous database tables (what also resulted in a very large and slow SQL query).
We've been able to improve the performance here 1000x times: from 5 seconds to 50ms for p99 slowest queries.
Next iteration
The main goal of the "accelerated table" was to give us time for planning of the next iteration. The way we designed the accelerated table was also supposed to make the migration to a different technology easier, because we heavily decoupled the queuing from the PostgreSQL implementation details (there is still some coupling there, though).
As per our scaling stats in Periscope the CI/CD pipelines adoption is still growing. With this rate of growth we might eventually need to move towards a horizontally scalable queuing solution. In order to deliver it until we will actually need, we should start working on the architecture evolution blueprint soon.
Previous discussions
We've discussed this topic since at least 2016. A few important discussion threads / issues:
Possible blockers:
/cc @clefelhocz1