Store root-namespace storage statistics on database
Problem to solve
Today we check storage statistics using a GROUP BY
operator on ProjectStatistics
and it's one of the longest running transaction in production (https://gitlab.com/gitlab-org/gitlab-ce/issues/62488)
We're using this information as part of a public API on storage counter at group level. And once we start enforcing storage limits we will need to rely on this query more often.
Also, our billing schema is based on root-namespace aggregation and this query do not aggregate to root-namespace.
Technical bits
- On gitlab.com we have namespaces with ~15k projects, this query takes
1.2
seconds to run. - If we try to analyze it with
Chatops
it timeouts: https://ops.gitlab.net/gitlab-com/chatops/-/jobs/528372 - On our EE we have a
EE::NamespaceStatistics
table that keeps the root-namespace aggregation but it's only used for tracking pipelines minutes.
Proposal
- Create a new model with the same attributes as
ProjectStatistics.*_size
. The purpose of this model will be to hold the information in an aggregated form. - Update the statistics in this model in an async way, to avoid large database transactions. (See backend section for the technical details)
- Rework !28277 (merged) to make use of this new query - https://gitlab.com/gitlab-org/gitlab-ce/issues/62796
Development log
Decisions
- There is some prework that needs to be done before starting working on this issue.
- Since it was reported (here and on https://gitlab.com/gitlab-org/gitlab-ce/issues/62488), that the pattern we currently use for updating
project_statistics
doesn't scale properly for GitLab.com, we've decided to go with a different approach for updating the namespace statistics: With a CTE refresh strategy based on the namespace routes. (https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28996#note_178132519)- On backend implications we've outlined all the technical details
- While working through the CTE approach, WE noticed that it might not be easy to implement and not going to be compatible with MySQL (https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28996#note_181094357, https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28996#note_180759005)
- There's another possible approach of adding a new column on
namespaces
table that tracks the root namespace and calculate the statistics based on this column (https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28996#note_178311781)
- There's another possible approach of adding a new column on
- We agreed that https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/29837?commit_id=110478466ab85ac7a7ff69cd6dee300169b05128#note_182994031 it's fast enough for an async processing job in sidekiq, and it will allow us to avoid running a migration on
namespaces
. The meeting was recorded- Regular query was implemented and merged on https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28996
- After https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28996, we discovered an edge case that needs to be solved https://gitlab.com/gitlab-org/gitlab-ce/issues/62214#note_187584895
- Bug was detected before https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28996 reached production: gitlab-org/gitlab-ce#64079
- We decided to measure the group storage statistics on staging and production. Details here https://gitlab.com/gitlab-org/gitlab-ce/issues/64092
- Performance was measured on staging and production. No inconvenient or error was found. All details in the issue.
Backend implications
Prework
-
%12.0 ~backstage remove nils from project_statistics.packages_size
https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28400 https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/13163 (@nolith) -
%12.0 gitlab-ee#11675 affects root-namespace aggregation on NamespaceStatistics
and should be fixed before doing this. (@nolith)
%12.1 )
Technical details (- Create
root_namespace_storage_statistics
with all theProjectStatistics.*_size
attributes - Create a second table (
namespace_aggregation_schedules
) with two columnsid
andnamespace_id
. - Whenever the statistics of a project changes, we insert a row into
namespace_aggregation_schedules
- We don't insert a new row if there's already one related to the namespace.
- Insertion is done through a callback and with a Sidekiq job. We can't do it in the same transaction as
ProjectStatistics
is already involved in a large one (https://gitlab.com/gitlab-org/gitlab-ce/issues/62488)
- After inserting the row, we schedule a new worker
X
hours into the future. - This job will:
- Update the root namespace storage statistics by querying all the namespaces through a service.
- Delete the related
namespace_aggregation_schedules
after the update
- We also need to create another Sidekiq job that will traverse any remaining rows on
namespace_aggregation_schedules
and schedule jobs for every pending row. - Hide all these changes behind a FF
- we will read the interval of caching time form redis defaulting to once every 3 hours
- we will experiment tweaking the interval aiming for a smaller value
- when we will remove the feature flag, the interval must be hardcoded or converted to an application setting (to be decided)
Merge Requests
-
Step 1 & 2 are implemented on https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/29570 -
Step 3 to 8 are implemented on https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28996 -
Never release the redis lease gitlab-org/gitlab-ce#64079 - https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/30305 -
Schedule a Namespace::AggregationSchedule
worker when some columns are refreshed onProjectStatistics.refresh!
- https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/30329 -
Hardcore the lease time depending on the analisis https://gitlab.com/gitlab-org/gitlab-ce/issues/64092 - https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/31341 -
Remove the feature flag - https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/31392
Edited by Mayra Cabrera