Introduce additional DB table (acceleration structure) to optimise job queueing (as an intermediate solution for better queueing)
Currently we have very expensive query run on top of ci_builds where we (gitlab-com/gl-infra/production#3712 (closed)):
- look for matching projects
- look for pending builds
- match tags
- match other filters
- look at quota
This is a problem:
-
ci_buildsis expensive to access: this is very wide table that often times out -
ci_buildscannot be partitioned as otherwise we would not be able to fetch all jobs - for accessing
tagswe cross-join another tabletaggings - for accessing quota we cross-join
project/namespace - we check access level based on
project/namespace
As a way to accelerate filtering:
- Introduce
ci_pending_buildstable - Design table so we would not have to load
ci_builds(a very wide table) as part of query as part ofRegisterJobServicefor filtering - We would still load
ci_buildsfor the purpose of accepting build, but the filtering should be significantly faster and provide more capacity - This would allow us to make
ci_buildspartitioned without breaking queueing - Table would consist as much data as possible to perform build matching: at least
tags,protected,project_id, and whatever else is needed - Insert build to table on status transition to
pendingas part of state machine - Delete item from table on status transition from
pendingas part of state machine - Change
RegisterJobServiceto filter usingci_pending_buildsinstead ofci_builds - We assume that queries would have a significantly lower cost, as we would have much easier and cheaper to access data, and be able to hold this pending queue in memory of postgres for quick filtering
This acceleration structure is proposed as a follow-up on gitlab-com/gl-infra/production#3712 (closed). If designed properly this could be used for all future work on queueing as well. This can be an easy way to improve performance today without spending a lot of effort on it.
This can be a way to improve performance today, with a potential throw-away solution without a lot of impact on a codebase (hopefully)_.
Edited by Grzegorz Bizon