Store root-namespace storage statistics on database
Problem to solve
Today we check storage statistics using a GROUP BY operator on ProjectStatistics and it's one of the longest running transaction in production (https://gitlab.com/gitlab-org/gitlab-ce/issues/62488)
We're using this information as part of a public API on storage counter at group level. And once we start enforcing storage limits we will need to rely on this query more often.
Also, our billing schema is based on root-namespace aggregation and this query do not aggregate to root-namespace.
Technical bits
- On gitlab.com we have namespaces with ~15k projects, this query takes
1.2seconds to run. - If we try to analyze it with
Chatopsit timeouts: https://ops.gitlab.net/gitlab-com/chatops/-/jobs/528372 - On our EE we have a
EE::NamespaceStatisticstable that keeps the root-namespace aggregation but it's only used for tracking pipelines minutes.
Proposal
- Create a new model with the same attributes as
ProjectStatistics.*_size. The purpose of this model will be to hold the information in an aggregated form. - Update the statistics in this model in an async way, to avoid large database transactions. (See backend section for the technical details)
- Rework !28277 (merged) to make use of this new query - https://gitlab.com/gitlab-org/gitlab-ce/issues/62796
Development log
Decisions
- There is some prework that needs to be done before starting working on this issue.
- Since it was reported (here and on https://gitlab.com/gitlab-org/gitlab-ce/issues/62488), that the pattern we currently use for updating
project_statisticsdoesn't scale properly for GitLab.com, we've decided to go with a different approach for updating the namespace statistics: With a CTE refresh strategy based on the namespace routes. (https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28996#note_178132519)- On backend implications we've outlined all the technical details
- While working through the CTE approach, WE noticed that it might not be easy to implement and not going to be compatible with MySQL (https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28996#note_181094357, https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28996#note_180759005)
- There's another possible approach of adding a new column on
namespacestable that tracks the root namespace and calculate the statistics based on this column (https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28996#note_178311781)
- There's another possible approach of adding a new column on
- We agreed that https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/29837?commit_id=110478466ab85ac7a7ff69cd6dee300169b05128#note_182994031 it's fast enough for an async processing job in sidekiq, and it will allow us to avoid running a migration on
namespaces. The meeting was recorded- Regular query was implemented and merged on https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28996
- After https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28996, we discovered an edge case that needs to be solved https://gitlab.com/gitlab-org/gitlab-ce/issues/62214#note_187584895
- Bug was detected before https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28996 reached production: gitlab-org/gitlab-ce#64079
- We decided to measure the group storage statistics on staging and production. Details here https://gitlab.com/gitlab-org/gitlab-ce/issues/64092
- Performance was measured on staging and production. No inconvenient or error was found. All details in the issue.
Backend implications
Prework
-
%12.0 ~backstage remove nils from project_statistics.packages_sizehttps://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28400 https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/13163 (@nolith) -
%12.0 gitlab-ee#11675 affects root-namespace aggregation on NamespaceStatisticsand should be fixed before doing this. (@nolith)
Technical details (%12.1 )
- Create
root_namespace_storage_statisticswith all theProjectStatistics.*_sizeattributes - Create a second table (
namespace_aggregation_schedules) with two columnsidandnamespace_id. - Whenever the statistics of a project changes, we insert a row into
namespace_aggregation_schedules- We don't insert a new row if there's already one related to the namespace.
- Insertion is done through a callback and with a Sidekiq job. We can't do it in the same transaction as
ProjectStatisticsis already involved in a large one (https://gitlab.com/gitlab-org/gitlab-ce/issues/62488)
- After inserting the row, we schedule a new worker
Xhours into the future. - This job will:
- Update the root namespace storage statistics by querying all the namespaces through a service.
- Delete the related
namespace_aggregation_schedulesafter the update
- We also need to create another Sidekiq job that will traverse any remaining rows on
namespace_aggregation_schedulesand schedule jobs for every pending row. - Hide all these changes behind a FF
- we will read the interval of caching time form redis defaulting to once every 3 hours
- we will experiment tweaking the interval aiming for a smaller value
- when we will remove the feature flag, the interval must be hardcoded or converted to an application setting (to be decided)
Merge Requests
-
Step 1 & 2 are implemented on https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/29570 -
Step 3 to 8 are implemented on https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28996 -
Never release the redis lease gitlab-org/gitlab-ce#64079 - https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/30305 -
Schedule a Namespace::AggregationScheduleworker when some columns are refreshed onProjectStatistics.refresh!- https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/30329 -
Hardcore the lease time depending on the analisis https://gitlab.com/gitlab-org/gitlab-ce/issues/64092 - https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/31341 -
Remove the feature flag - https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/31392
Edited by Mayra Cabrera