Skip to content

Draft: Draft PoC Utilize Redis HLL to perform uniq counts over foreign keys in database

What does this MR do?

This MR attempts to address two areas:

  1. https://gitlab.com/gitlab-org/telemetry/-/issues/421 requested feature that would allow to calculate unions over different metrics, because some metrics are only tracked via Redis HLL counter, in order to be able to calculate unions with database calculated metrics, we need first create HLL representation of them, and then calculate their unions with Redis only metrics. Additionally calculating unions on database level, will yield additional load over database, as intermediate results calculated for single xMAU can't be reused during calculating union values over multiple xMAUs. While with Redis HLL, once build HLL for given xMAU can be reused many times without need for redoing the same operations

  2. At #230438 (comment 421212723) the problem of lack control over batch size when using BatchCount to distinct count attributes that are not unique was uncovered. Scale of discrepancies was fully explained in that thread, and this is most extreme case already found moved here for context:

Below few examples from production database (fetched using chatops) for 1000 non pkey batch boundaries, actual cardinality was 1529766 and 730505

Index Only Scan using index_ci_builds_on_user_id on ci_builds  (cost=0.57..43720.92 rows=1529766 width=4) (actual > time=0.025..2088.610 rows=1575298 loops=1)
  Index Cond: ((user_id >= 1) AND (user_id <= 1000))
  Heap Fetches: 14160
  Buffers: shared hit=55503 read=10610
  I/O Timings: read=1451.661
Planning Time: 10.830 ms
Execution Time: 2172.705 ms
Index Only Scan using index_ci_builds_on_user_id on ci_builds  (cost=0.57..19284.80 rows=674373 width=4) (actual time=1.567..946.955 rows=730505 loops=1)
  Index Cond: ((user_id >= 100000) AND (user_id <= 101000))
  Heap Fetches: 5645
  Buffers: shared hit=27165 read=4132
  I/O Timings: read=720.263
Planning Time: 0.269 ms
Execution Time: 986.730 ms

Main problem is that we can not create equal batches and capture all occurrences of give non unique attribute value at the same time. To overcome that, process of distinct count should be split into two. At first counted attribute values are extracted from database in fixed size batches and added to Redis HLL structure, and than uniq count is calculated at Redis.

When Redis HLL batch distinct count will be used

graph TB
  
  by_stage --> by_stage_flow
  subgraph "usage_activity_by_stage"
  by_stage_flow[Is given metrics used by xMAU?]
  by_stage_flow -- no --> NotxMauMetric[Is metrics calculated over unique attribute?]
  NotxMauMetric -- yes --> calculatedNotxMauMetric[Calcualte final value with defined counter: database, Redis etc.]
  NotxMauMetric -- no --> BuildRedisHLL[Build Redis HLL with BatchRedisHllDistinctCount]
  by_stage_flow -- yes --> xMauMetric[Is metric defined via Redis HLL?]
  xMauMetric -- no --> BuildRedisHLL[Build Redis HLL with BatchRedisHllDistinctCount]
  xMauMetric -- yes --> xMuaRedy[Count Redis HLL with pfcount]
  BuildRedisHLL --> xMuaRedy
  end

  subgraph "UsageData"
  Node1[Calculate system_usage_data] --> Node2[Calculate components_usage_data]
  Node2[Calculate components_usage_data] --> Node3[Calculate ...]
  Node3 --> by_stage[Calculate usage_activity_by_stage]
  by_stage --> calculateUnions[calculate xMAUS with HLL Redis]
  calculateUnions --> FinalThing[Send UsagePing to version app]
end

Screenshots

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Mikołaj Wawrzyniak

Merge request reports