Skip to content

Service Ping metric removal policy v2

Problem

Service ping metrics are added by individual product teams and collected as part of product usage. Every month these service pings are collected and aggregated to help teams understand how customers are using GitLab.

However, there are a collection of metrics that repeatedly fail to be collected for various reasons, month after month. These failed metrics make more work for internal GitLab teams and also means that the teams that want to use them cannot do so.

Proposal

Implement a process of identifying repeatedly failing service ping metrics, quarantine them after a grace period, and eventually delete them.

  • Note: This can be a manual process to start.

Note: We will exclude the identification and removal of unused metrics from this issue. That will be covered in future issues.

Identification of failing metrics - If a metric has failed in two monthly service ping generations in a row, it should be marked as a "failing metric" for the purposes of this issue.

  • Once a metric has been identified in this way, create an issue to notify the team that owns it.
    • The issue should describe what metric is being referenced, why it has been quarantined (i.e. this process), any information about why it is failing, a date for when it will be quarantined if no action taken, and a request to ping ~"group::product intelligence" if any help is needed.

Quarantine of metrics - Once a metric has been marked as a failed metric and has passed its "grace period," then the metric should be disabled from further reporting until re-enabled.

  • The grace period should be one month

Deletion of metrics - Once a metric has been quarantined for a given period of time, it should be deleted from the product if it has not been fixed.

  • This period should be six months.
  • Note: Consider doing deletion in a follow-on issue, since it is not a two-way decision.
Older issue description
  1. At 12 months of inactivity (not used in Sisense or other reporting) -OR- if a metric is broken
    1. Create an "Inactive/Broken Metric Review" issue and assign to product owner with due date of 30 days (to allow for OOO, etc)
    2. Bot should ping issue owner if no activity on issue each week until due date
    3. Product owner:
      1. Approves the removal of the metric (how the approval is done tbd)
        1. Issue is closed
        2. Removal process proceeds
      2. Denies the removal of the metric (how the denial is done tbd)
        1. Reason for preservation is documented in the issue
        2. Removal process stops
        3. The "inactivity date" clock resets
      3. Doesn't respond
        1. Bot documents lack of response by product owner in issue
        2. Issue is closed
        3. Removal process proceeds
Edited by Sam Kerr