Skip to content

Draft: Preliminary work on filtering based on additional properties

Jonas Larsen requested to merge j_lar/introduce_filtering-spike into master

What does this MR do and why?

Notice that I do not intent to merge this MR.

The MR is created for the purpose of getting feedback on the metric definition format and new Redis key format needed to support filters. This MR contains no code changes.

The background for why we need filters are described in #435338

In this MR I:

  1. Defined a new event with two additional properties. It looks like this:
    ---
    description: Packaged pushed to the registry
    internal_events: true
    action: push_package_to_registry
    identifiers:
    - project
    - namespace
    - user
    additional_properties:
      label:
        description: The name of the package type
      property:
        description: The auth type. Either 'guest', 'user' or 'deploy_token'
    product_section: ci
    product_stage: package
    product_group: package_registry
    milestone: '17.0'
    introduced_by_url: TODO
    distributions:
    - ce
    - ee
    tiers:
    - free
    - premium
    - ultimate
  2. Modified the metric schema, for Internal Events metrics, to allow the filter definition described below.
  3. Migrated ~30 metrics (both Redis and RedisHLL) in the micro framework build around the package repository. I migrated all metrics for deploy_token, a couple for user and all of the more general total count metrics. All metric are defined on the new push_package_to_registry event.
  4. Added entries to usage_data_counters/hll_redis_key_overrides.yml and usage_data_counters/total_counter_redis_key_overrides.yml to show how this would look when we want to reuse existing Redis keys.

The files usage_data_counters/hll_redis_key_overrides.yml and usage_data_counters/total_counter_redis_key_overrides.yml provide quite a few examples of how I imagine the Redis key could look like for these filtered metrics.

Metric definition

The different options we considered were originally discussed in this thread.

Here are a few examples of how different types of metrics could use a filter:

All time total count with a filter defined on one additional property:

[snip]
time_frame: all
data_source: internal_events
events:
  - name: push_package_to_registry
    filter:
      label: terraform_module
[snip]

Logic interpretation: label == "terraform_module"

Unique count of users with a filter defined on multiple properties:

[snip]
time_frame: 28d
data_source: internal_events
events:
  - name: push_package_to_registry
    unique: user.id
    filter:
      label: terraform_module
      property: deploy_token
[snip]

Logic interpretation: label == "terraform_module" && property == "deploy_token"

Unique count of users on multiple event (the same event in this case). A filter is defined for each event:

[snip]
 - name: push_package_to_registry
    unique: user.id
    filter:
      label: conan
      property: deploy_token
  - name: push_package_to_registry
    unique: user.id
    filter:
      label: generic
      property: deploy_token
  - name: push_package_to_registry
    unique: user.id
    filter:
      label: helm
      property: deploy_token
[snip]

Logic interpretation: label == "conan" && property == "deploy_token" || label == "generic" && property == "deploy_token" || label == "helm" && property == "deploy_token"

Redis key format

I went the verbose way and encode the entire filter in the Redis key so you can read it fairly easy:

The two examples above would be stored under the following Redis keys, if nothing is done to override them:

  1. {event_counters}_push_package_to_registry-[label:terraform_module]
  2. {hll_counters_push_package_to_registry-[label:terraform_module,property:deploy_token]-user

The property names in the filter part of the key are sorted lexicographically to prevent ambiguity.

Related to #435338

Edited by Jonas Larsen

Merge request reports