Deprecate CI minutes reset jobs in favour of tracking historical usage data
Problem
Today we accumulate CI minutes used by a project/namespace on a single DB entry that we have to reset at midnight of every 1st of the month. This reset is performed on every namespace and project on Gitlab.com which is a heavy procedure to run given our ever increasing number of namespaces and projects.
Objective
How can we reset the monthly usage without running a forceful reset across all Gitlab.com namespaces and projects?
Given that we are going to track monthly CI minutes usage, could we leverage this monthly accumulation of minutes to show and enforce the current month usage?
Proposed solution
Ideally we should move away from using namespace_statistics.shared_runners_seconds
to store the current usage because this requires a reset every month. If we move to a monthly usage tracking then we could eventually retire namespece_statistics.shared_runners_seconds
and use a more dynamic approach:
- To accumulate the minutes used we can upsert (insert or update) a record for the current month. If no minutes used for a while we may show that for a namespace there is no record for the latest month, which means no minutes used.
- To display the quota used we would look for the record associated to the current month. If exists we show the data, otherwise we show 0 usage
- To enforce minutes used we would need to modify how
Ci::RegisterJobService
excludes jobs from projects with exhausted quota, to look at the data from the current month
Additional minutes should also be tracked monthly. When we create the record for the new month we would need to recalculate available additional minutes considering the past month usage.
This may allow us to effectively remove the need to reset minutes. It could also have the advantage that the tracking of the quota would be more accurate. E.g. not having minutes from February being added to minutes from January because the reset didn't work properly or was too slow.
Possible iteration plan
- Introduce
ci_project_monthly_usage
andci_namespace_monthly_usage
tables/models and track monthly minutes in there when a build finishes - this can be done in the background for the first month given that the usage would be partial. On the 1st of the month after this is deployed, statistics should match those inproject_statistics.shared_runners_seconds
andnamespace_statistics.shared_runners_seconds
. At this point we will have ways side-by-side to track CI minutes usage but only the old way is being enforced. - When customer purchases additional minutes we should add the purchased ones to the current month record in
ci_namespace_monthly_usage
(side-by-side with the existingnamespaces.extra_shared_runners_minutes_limit
). When a new monthly record is created automatically (on the 1st of the month) we should recalculate the remaining purchased minutes based on the previous month usage. - We could turn the CI usage visualization to use the data from the current month rather than
namespace_statistics.shared_runners_seconds
. At this point we would still use data fromnamespace_statistics.shared_runners_seconds
to enforce the CI usage when jobs get assigned to a Shared Runner. -- Need to coordinate with sectiongrowth or sectionfulfillment - Change
EE::Ci::RegisterJobService
to enforce the CI usage based on the monthly usage. - Remove feature flags related to these changes
- Remove
ClearSharedRunnersMinutesWorker
and related workers/services - Remove legacy data related to CI minutes (in
namespaces
,projects
,namespace_statistics
andproject_statistics
)
Notes:
- after (1) we should be able to get frontend to work on how to display this monthly usage #246844 (closed)
Further analysis needed:
-
last_ci_minutes_notification_at
andlast_ci_minutes_usage_notification_level
innamespaces
table are used to send notifications to the customer when the remaining CI usage quota goes below a certain threshold. These values are also reset monthly today byClearSharedRunnersMinutesWorker
, so we would also need to automatically rotate those by storing them inci_namespace_monthly_usage
aslast_notification_at
andlast_notification_level
. Need to coordinate this with sectiongrowth or sectionfulfillment