Usage data counter interface
Harden Usage Ping - Consolidate all counters into four main counters with fail safes - Add comment for usage_data.rb on the usage [Parent Issue](https://gitlab.com/gitlab-org/telemetry/-/issues/335) ### Problem In its current state, the entire usage ping payload breaks if there is an uncaught error. example. `avg_cycle_analytics` was giving an uncaught error: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/26381. This was fixed in 12.9 ### Result Teams can add metrics to usage ping without the breaking the entire payload. ### Proposal V1 Propose the MVC required to ensure usage ping does not break due to uncaught errors. Per @jeromezng 's comment [here](https://gitlab.com/gitlab-org/telemetry/-/issues/335#note_307573839): > Isolation will ensure robustness while parallelization ensures speed. Currently robustness is more important than speed (GitLab.com usage ping takes about 11 hours to run and with query optimizations this is reduced to \~6 hours). > > MVC for robustness: > > * Have a defined list of usage pings in usage_data.rb or a yaml file > * Cron job cycles through this entire list sequentially > * Each job calls a get_counter(attributes, etc) method, which can fail individually. > * Each get_counter method saves counter result to database > * Each get_counter method sends an atomic payload to Versions OR we wait for all get_counter methods to finish then query the database to build and send a single large payload to Versions. > * We can optionally parallelize by breaking this into three jobs: usage_activity_by_stage, usage_activity_by_stage_monthly, other counters > > Other ideas we discussed: > > * The idea about having a "Defined list of usage pings" in a separate database isn't a great option as it introduces state which needs to be configured. I'd rather have counters be stateless and defined in source control. > * The idea to be able to set_counters via chatops is something we can explore in the future, but also requires configuration with varying states. ### Proposal V2 **12.10:** Consolidate all counters into four main counters with fail safes. Create an example of `add_usage_data` method. * Consolidate all counters into four main counters. We've already added \~90% to these: Batch Count and Distant Count. * The four main counters will be: Batch Count, Distinct Count, Redis Count, Alternative Count. * Convert non-batch counters to batch counters: https://gitlab.com/gitlab-org/gitlab/-/issues/208923 * Redis Counters: this includes anything that uses redis, this however may be a "russian doll" where usage pings calculations are temporarily stored in Redis / multiple usage pings. We will need to spend time tracing this. * Alternative Counter: this includes anything miscellanous such as https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/usage_data.rb#L26 * The four main counters will have their own `rescue` fail safe. Similar to what is currently done in Batch Count and Distinct Count. * We will then have an `add_usage_data` which is used to append data to the JSON payload. * `add_usage_data` method which will wrap all four main counters with rescue fail safes. * For 12.10, `add_usage_data` will have four examples, one for each of the main counters. * Work with stage teams to reimplement their counters using the four main counter methods: * Jira Usage https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/usage_data.rb#L198 * cycle_analytics **13.0:** Expand `add_usage_data` to all 400 counters * Expand the `add_usage_data` from four examples to all 400 counters.
issue