ci_builds is expensive to access: this is very wide table that often times out
ci_builds cannot be partitioned as otherwise we would not be able to fetch all jobs
for accessing tags we cross-join another table taggings
for accessing quota we cross-join project/namespace
we check access level based on project/namespace
As a way to accelerate filtering:
Introduce ci_pending_builds table
Design table so we would not have to load ci_builds (a very wide table) as part of query as part of RegisterJobService for filtering
We would still load ci_builds for the purpose of accepting build, but the filtering should be significantly faster and provide more capacity
This would allow us to make ci_builds partitioned without breaking queueing
Table would consist as much data as possible to perform build matching: at least tags, protected, project_id, and whatever else is needed
Insert build to table on status transition to pending as part of state machine
Delete item from table on status transition from pending as part of state machine
Change RegisterJobService to filter using ci_pending_builds instead of ci_builds
We assume that queries would have a significantly lower cost, as we would have much easier and cheaper to access data, and be able to hold this pending queue in memory of postgres for quick filtering
This acceleration structure is proposed as a follow-up on gitlab-com/gl-infra/production#3712 (closed). If designed properly this could be used for all future work on queueing as well.
This can be an easy way to improve performance today without spending a lot of effort on it.
This can be a way to improve performance today, with a potential throw-away solution without a lot of impact on a codebase (hopefully)_.
Kamil Trzcińskichanged title from Optimise job picking to Introduce additional DB table (acceleration structure) to optimise job queueing
changed title from Optimise job picking to Introduce additional DB table (acceleration structure) to optimise job queueing
Kamil Trzcińskichanged the descriptionCompare with previous version
changed the description
Kamil Trzcińskichanged title from Introduce additional DB table (acceleration structure) to optimise job queueing to Introduce additional DB table (acceleration structure) to optimise job queueing (as an intermediate solution for better queueing)
changed title from Introduce additional DB table (acceleration structure) to optimise job queueing to Introduce additional DB table (acceleration structure) to optimise job queueing (as an intermediate solution for better queueing)
The query cost reduced from 51823820 to 6453633 a 9x improvement!
The query is faster by around 100ms, from 469ms to 364ms. However, here we have no load, so this will be significantly different value for a live system
More fields (these should be pretty constant):
This table should contain: tags (unsure in what form, is there some clever trick to do this matching?)
This table should contain: protected a flag to indicate if builds needs to use protected runner
This table should contain: runs_on_shared (or is_shared) or anything else allow us to indicate that
All of that can be even more optimised by (iteratively):
The new table allows us to remove more parts of the big query, step by step, and recalculate it as part of ci_pending_builds
We could precalculate: namespace quota: we could remove entries from ci_pending_builds or have a flag: is_shared (if a build needs to be executed by shared runners)
We could precalculate: also as part of is_shared a project visibility and eligability for being picked by shared runners
If project runs out of quota/visibility, we could update its all builds to have is_shared set to false, end reset it if still has quota/visibility
We could store tags in as an array, to be able to quickly bitmask them with our list of tags
The table ci_builds could be partitioned, as we would only depend on ci_pending/running_builds being in-memory for quick builds matching
Especially, if we could somehow do it as a drop-in replacement to current system, and just evaluate its impact with a minimal effort. I would expect that this would be a massive difference for GitLab just because of much more compact data structure.
@ayufan the proposed solution sounds like a nice approach to me as long as we don't have any information that may change in ci_builds_pending (so we would have to maintain the consistency of the materialized table) and we only have to update it on the cases you mention:
Insert build to table on status transition to pending as part of state machine
Delete item from table on status transition from pending as part of state machine
The table and indexes would be pretty bloated, but it would stay comparably small and I assume that Autovacuum and our re-indexing processes should be able to handle that bloat (others can correct me on this statement).
Other than that, we would have to add a couple additional indexes, but that's a minor implementation discussion.
Kamil Trzcińskichanged title from Introduce additional DB table (acceleration structure) to optimise job queueing (as an intermediate solution for better queueing) to Idea to consider: Introduce additional DB table (acceleration structure) to optimise job queueing (as an intermediate solution for better queueing)
changed title from Introduce additional DB table (acceleration structure) to optimise job queueing (as an intermediate solution for better queueing) to Idea to consider: Introduce additional DB table (acceleration structure) to optimise job queueing (as an intermediate solution for better queueing)
Kamil Trzcińskichanged the descriptionCompare with previous version
Grzegorz Bizonchanged title from Idea to consider: Introduce additional DB table (acceleration structure) to optimise job queueing (as an intermediate solution for better queueing) to Introduce additional DB table (acceleration structure) to optimise job queueing (as an intermediate solution for better queueing)
changed title from Idea to consider: Introduce additional DB table (acceleration structure) to optimise job queueing (as an intermediate solution for better queueing) to Introduce additional DB table (acceleration structure) to optimise job queueing (as an intermediate solution for better queueing)
I removed "Idea to consider" from the title, because this is something we want to do. We want to prioritize that because of the recent decision to adjust our direction related to the ci_builds work -> !52203 (comment 543781716)
Can we please describe access patterns for the ci_builds_pending table? How are we going to access this table?
This would help us to reason about a partitioning strategy, which may be beneficial to have given that this is going to be a high-churn table (causing a lot of vacuum activity, which benefits greatly from partitioning).
@abrandl we do not intend to partition this table. This table is used for queuing instance-wide and adding partitioning to the equation might significant increase complexity of this solution.
@grzesiek Hence the question - can we describe how we will access this table?
I believe #322766 (comment 517051198) describes one way of accessing the table. It would be great if we had an exhaustive set of queries sketched that demonstrate how we will access this table.
This table is used for queuing instance-wide and adding partitioning to the equation might significant increase complexity of this solution.
Instance-wide or not, there may be opportunities for partitioning that are completely transparent (so not increasing the complexity) - but without knowing exactly how we access the table, we cannot reason about this.
Or to put this differently - if we intend to skip partitioning the table, we would benefit from documenting why a partitioning pattern is not suitable (which entails describing the access patterns in any case).
Going forward, reasoning about access patterns to support partitioning decisions is something we should be doing with all new tables that we expect to play a significant role, are going to be large or expected to have high churn. I would think ci_builds_pending qualifies this.
Hence the question - can we describe how we will access this table?
The idea behind this table is to use only this one to find builds that need to be processed. I agree that we should document it better why we do not intend to partition this table, but no intention to partition does not mean we will not do it :) We simply do not have data right now that suggests that partitioning in this case makes sense.
@grzesiek I realize we need to describe and document better what the ask is and what we're looking for here. I plan to do this in #327204 (closed), but there's been no time yet.
We simply do not have data right now that suggests that partitioning in this case makes sense.
This is the exact reason for my question. We should know how we plan to access this table and document this. This is the data we need to be able to decide whether or not partitioning makes sense.
While it's technically possible to partition a table after it has been introduced, it is a lengthy process to do so. However, what's worse is that - on a higher level - we should aim to find partitioning strategies that not only work for one table. This is thinking ahead about a potential sharding design, where it's crucial to avoid having many different dimensions to slice data by. Note however that this idea is totally valid also in a non-sharded, but partitioned database design.
Hence I think we should start doing this (roughly speaking - for all major tables we introduce):
What are common filters that we plan to access a table by
Implement limitations so that we actually don't end up accessing the table by an unexpected dimension
Based on (1) and (2), decide whether or not partitioning this table makes sense
Again I realize we lack documentation and agreement on how and what needs to be documented. Going forward, I would like to develop a common understanding here - but this takes time.
Meanwhile, if you'd like to discuss on a call - I'm happy to do that, too.
My ask here is that we simply callout what (1) is in this case. Do we know that? If not, why? It'll be hard to verify a database design for scalability without knowing how this is going to be accessed.
I realize we need to describe and document better what the ask is and what we're looking for here. I plan to do this in #327204 (closed), but there's been no time yet.
@abrandl I want to describe this work in !52203 (merged). I also plan o work on this since I believe this might require a ton of domain knowledge from the CI/CD area to get done. I will happily collaborate with someone from the Database team, but getting this done will require a significant refactoring of features from the Verify area.
This is the exact reason for my question. We should know how we plan to access this table and document this. This is the data we need to be able to decide whether or not partitioning makes sense.
Yes, I plan to document this in the MR mentioned above.
I'm fine with scheduling a call and talking about what our plans are here too.
@grzesiek What's described here – the table ci_pending_builds and work with it – is a classic "queue in Postgres" approach, that will suffer from the following difficulties, unless it's properly designed:
Dead tuples and bloat will quickly lead to performance degradation – the provided query plan will quickly become inefficient because of this. This will be caused by DELETEs ("Delete item from the table ...") and/or UPDATEs. Solution: avoid DELETEs and, if possible, UPDATEs at all costs; use partitioning, partitions pruning without using DELETE.
Contention issues if we're going to process rows in this table in multiple threads. "SELECT .. FOR UPDATE SKIP LOCKED" is the solution.
I have seen saw many attempts to implement the "queue in database" thing, only a few of which were really successful (one of the successful ones: good old PgQ).
So I support questions from @abrandl, and we can design it properly (also, there is always an alternative: move outside of Postgres – use existing Sidekiq or introduce Kafka); and I agree with @ayufan that sharding decisions are going to affect the decisions in this task.
What's described here – the table ci_pending_builds and work with it – is a classic "queue in Postgres" approach, that will suffer from the following difficulties, unless it's properly designed
@nikolay I agree that it will not be that much different, but the new design using ci_pending_builds will be different enough to give us enough headroom to move to a much better solution. We have Redis queuing on our radar, but it we need to iterate in order to get there.
It also seems that using entire ci_builds table to find pending builds blocks sharding and partitioning initiatives a bit (or at least makes them more difficult to get done).
Given all of this it seems that moving queuing to a separate table is still something that will improve the current state. It will make it easier to move towards Redis queuing too, because we will need to redesign our system to remove the reliance on multiple JOINs, thus all the information about queuing will be more localized, and this is needed if we want to proceed with Redis queuing. This will make sharding and partitioning easier, and we will be able to proceed with partitioned ci_jobs table and gradual migration between ci_builds -> ci_jobs.
This is not a perfect solution, but something that is an intermediate step that will unblock progress towards a better solution. WDYT @NikolayS?
The "queuing" we have today is based on the gigantic ci_builds table and updating rows in this table. Plus, the relevant part of records for the "queuing" is typically a very small portion of the table (which stores the whole history of builds, the ones pending their execution are perhaps even negligible comparing to the overall number of records).
In my understanding, this is the core design problem that we would disentangle with this approach - which makes sense to me, especially if we were to iterate and find an even better queuing solution later (think: proper message queue).
Still, for the Postgres side, even if this is just to buy headroom, we will need to be looking into the specific implementation knowing that queuing in MVCC is hard to get right.
This will make sharding and partitioning easier, and we will be able to proceed with partitioned ci_jobs table and gradual migration between ci_builds -> ci_jobs.
@grzesiek Has this (partitioned ci_jobs) been detailed somewhere already?
@grzesiek Has this (partitioned ci_jobs) been detailed somewhere already?
Not documented yet, but this is fairly simple. A few assumptions:
One row per build should be enough to get all information about queuing
In that row we would need to store everything, available CI minutes, if a build can be picked by a shared runner etc.
This would reduce the reliance on all the JOINs and CTEs because all the data will be precalculated by Ruby
We hope this to simply be a SELECT with a few WHERE clauses, perhaps with FOR UPDATE lock, not sure yet what kind of locking we will use here
There is also an opportunity to batch DELETEs and INSERTs into this table, depending on circumstances and implementation details.
It is a much more flexible solution that using ci_builds to queue pending builds. Implementation details might change, because ultimately how we do that depends on iteration and feedback we get from it. Does it make sense @abrandl?
@grzesiek From a high-level the approach makes sense to me. I worry about the implementation details, but it's perhaps too early for that.
Not documented yet, but this is fairly simple. A few assumptions:
@grzesiek It sounds like we'd be denormalizing all the information into a single, presumably quite wide table and using this table to implement the queuing mechanic? I would like to look at this with more detail once available - the combination of denormalization and a high-churn table (queuing) sounds like it can become a problem.
I anticipate this is too early, but I would suggest we layout the desired design with:
How will the table look like exactly?
How will we insert, select, delete, update data (data access)?
It would be good to get some statistics on how much data we expect on .com, e.g. number of records and wideness of the table.
Please let me know if/how I can help best here. Would you mind having me aboard with !52203 (diffs) representing the database group in development?
@abrandl I can provide some details from the top of my head:
Number of records: up to 10k at peak
Number of columns: 5-8
Data access: Ruby abstraction using a separate service
One concern I have is how to make this transactional with the status transaction in ci_builds. Making an INSERT to ci_pending_builds in a transaction with UPDATE SET in ci_builds might undermine benefits of using this table. Perhaps we can do that asynchronously, but I would expect to find a solution for this through a mindful iteration.
we would need to know the schema - it makes a huge difference if there is a huge jsonb column, for example.
Thinking - once we have the schema defined - we could take a snapshot of production and transform this into however the table's contents would look at this particular point in time. This would allow us to reason more about the size/wideness of the table.
Data access: Ruby abstraction using a separate service
Sure - we will abstract it in Ruby. In this context, we're concerned with how the access translates into actual data access on the database side. It'll be insightful to compile a list of insert/select/delete/update queries and how their parameters look like.
One concern I have is how to make this transactional with the status transaction in ci_builds.
Perhaps we can do that asynchronously, but I would expect to find a solution for this through a mindful iteration.
Knowing the expected database queries and schema will allow us to reason about this better and also allow us to more confidently delay a decision if we can.