With the new monthly tracking of CI minutes we are storing minutes consumption side-by-side with the legacy tracking in namespace_statistics and namespaces table.
We need to ensure that we can enforce CI minutes limit when the Runner picks the jobs from the queue, using the new monthly tracking.
Proposal
Update the Ci::RegisterJobService to enforce CI usage based on the monthly usage quotas from the new table ci_namespace_monthly_usage, so that jobs are assigned to shared runners based on the new data models.
We need to use:
ci_namespace_monthly_usages.amount_used instead of namespace_statistics.shared_runners_seconds
We need to introduce the change with a feature flag.
Keep in mind that there is work involved in introducing a ci_pending_builds table to improve performance of the Runner queue. Wether we will still have the BIG QUERY or joining ci_pending_builds with CI minutes available or wether we pre-calculate the CI minutes availability for a namespace, the goal of this issue is to use ci_namespace_monthly_usages instead of the legacy namespace_statistics for reading the CI minutes usage.
@fabiopitino Is this another candidate::14.x given that this is not blocked by any other issue? although it's lower in the epic (since you've prioritized accordingly) so it looks like we can get to it later then.
@cheryl.li Due to the new proposal of #277447 (closed) I've removed the block for this issue as well, so it can be worked any time. We may need to reprioritize the issues in the epic based on blocks. A few of them (including this one) were lower in the epic because of the blocks. Since some of them are unblocked we can move them up.
There is a significant overlap between this issue and &5909 (closed). I wonder if we should focus on shipping the accelerated table first to make it easier to JOIN minutes through this table /cc @fabiopitino@cheryl.li@jreporter
@jreporter if I understood @grzesiek correctly, he is advocating that we focus on implementing &5909 (closed) first so that the DB queries added in this issue is based on the new DB table being added in &5909 (closed). Does that reflect what you referenced, @grzesiek? So I think that epic could actually block this effort.
Yes, I think it would make sense to use the new table, because once we work on the new epic, we would need to re-implement what is described in this issue again. This would be a duplication of work.
@grzesiek In &5909 (closed) are we planning to have ci_pending_builds joining with the new ci_namespace_monthly_usages in order to filter out namespaces without CI minutes at some point during the iteration? I see that the end goal is to precalculate this CI minutes availability and add set ci_pending_builds.runs_on_shared accordingly.
The purpose of this issue is to replace the use of namespace_statistics with ci_namespace_monthly_usages when enforcing CI minutes in the Runner queue. I think it's orthogonal to &5909 (closed) but it's good to keep in mind where we are heading. Whether we will still have the BIG query, or `ci_pending_builds with the JOIN, or need to pre-calculate the CI minutes availability I think the scope remains the same.
I'll change the title and description to be more generic.
@fabiopitino the idea behind ci_pending_builds is to denormalize some data, so we don't want to JOIN with ci_namespace_monthly_usages, instead we want to read the data from there before we insert a new row to ci_pending_builds. It means that it is a complete separate mechanism to replacing the table name in the JOIN from ci_builds and we can definitely ship both, but it means that once we start working on ci_pending_builds we will need to replace the newly shipped mechanism that joins ci_namespace_monthly_usages instead of namespace_statistics because the JOIN will no longer be required. Does it make sense?
if I understood @grzesiek correctly, he is advocating that we focus on implementing &5909 (closed) first so that the DB queries added in this issue is based on the new DB table being added in &5909 (closed). Does that reflect what you referenced, @grzesiek? So I think that epic could actually block this effort.
@grzesiek I agree with the plan and it does make sense but it depends on when we plan to work on ci_pending_builds and fully denormalize the data in it.
This issue is necessary in order to complete the migration of CI minutes tracking to the new model. The RegisterJobService is small part of the overall plan. If we do &5909 (closed) starting immediately then in this issue we need to read the "CI quota exceeded?" data from the new model rather than the old one and store the result in ci_pending_builds.runs_on_shared. If we don't have ci_pending_builds ready yet and we need to move forward with the monthly tracking then we would need to change the BIG QUERY in the meantime. Even if this means doing redundant work that will change soon, since the epic contains more changes to be made in order to complete the migration.
I agree @fabiopitino. I think that there is a plan to start ci_pending_builds immediately, and denormalizing CI minutes seems like a good first iteration. See the first iteration issue: #329764 (closed)
It says that first thing we could move is the fair scheduling data, but perhaps CI minutes would be a better choice. I think that introducing ci_pending_builds with build_id, project_id, and minutes (or however we call this column) might be as simple as updating the big query. What do you think about that?
@grzesiek That would be great. To get the remaining minutes of even a flag quota_exceeded? we should use Ci::Minutes::Quota class. Once we are ready to replace the old model with the new monthly tracking model, that would be done inside Ci::Minutes::Quota. So the interface would not change and whatever code we use to populate ci_pending_builds won't change either since Ci::Minutes::Quota is already an abstraction over whatever model we used today.
@fabiopitino@grzesiek Just wanted to confirm that we can proceed with this issue in %14.0? I understand that even if there's redundancy with some of the issues from &5909 (closed), this issue is still needed to support the &4840 (closed) as part of our CI Minutes Rearchitecture.
@fabiopitino - I will put this at the top of the backlog in 14.0 and call it a VerifyP2 - so it will be a ~VerifyP1 in 14.1. We are saturated right now in %14.0
Fabio Pitinochanged title from Update the {-RegisterJobService-} to enforce CI usage based on Customers' CI Minutes monthly allowance to Update the Runner queue logic to enforce CI usage based on CI Minutes availability
changed title from Update the {-RegisterJobService-} to enforce CI usage based on Customers' CI Minutes monthly allowance to Update the Runner queue logic to enforce CI usage based on CI Minutes availability
Fabio Pitinochanged the descriptionCompare with previous version