Efficient counters
We have various places where we want to count things.
-
For example, we keep track of number of repositories and wikis in
site_statistics
. However, the implementation here is not efficient because all requests that change the count are serialized on this one record, sometimes resulting in timeouts errors. -
The usage ping requires a lot of counting
-
We're adding more counting every now and then, e.g. in https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/22007.
In short, concurrent updates to counters in statistics tables (e.g. project_statistics
) often fail with query timeouts due to resource contention. We need to make updates more performant and non-blocking while still allow the values to be read correctly.
The proposal here is to provide an efficient implementation of exact counters. By exact, we mean consistent with the postgres database (MVCC compatible).
!35878 (merged))
Proposal (implemented inLet's use the ProjectStatistics
model as an example. We want to efficiently update build_artifacts_size
counter without incurring into query timeouts.
In this MR we introduce a new module ConterAttribute
that brings counter attributes functionality. It provides a methods increment_counter
that increments the counter on Redis and schedules a worker after some time to flush the increments to the database.
This way:
- writes to the primary columns are only performed by the background worker
- reads can be performed against the primary columns (with some delay in accuracy) or including pending increments
Blocker
This issue is currently blocking:
- All groups from adding useful usage ping statstics
- Issues listed under Blocks in Linked issues section of this issue