Skip to content

Product Analytics via Usage Ping MVC - Parent Issue

Overview

This issue is a convergence of our work on Self-Managed Event Tracking https://gitlab.com/gitlab-org/telemetry/-/issues/373 and Product Analytics gitlab-org/gitlab#211568 (closed)

Once the Product Analytics MVC MR gitlab-org/gitlab!27730 (closed) is merged, we will have a product_analytics_events table which will hold events from external applications and from a GitLab instance.

Using our existing Usage Ping feature, we will need to begin looking at ways to aggregate the GitLab instance events data so it can be sent back to us via Usage Ping.

The purpose of this Usage Ping data is to help us build a better GitLab. Data about how GitLab is used is collected to better understand what parts of GitLab needs improvement and what features to build next.

MVC

The goal of this MVC is to aggregate Snowplow data in the product_analytics_events table so it can be sent back to us via Usage Ping. The key is to aggregate the data in a way which it is useful for reporting purpose in Sisense.

Some ideas we've explored include

  • product_analytics_per_day daily aggregation table
  • product_analytics_counters_per_user_per_day daily aggregation table

Long term concerns (out of scope for MVC)

Scale of data

GitLab.com currently sees up to 18.2M events per day with peaks of 1.25M events per hour or 208,333 events per min. The events are separated into 17M good events, 1.2M bad events per day. Good events meaning the event is structured according to Snowplow's defined schema. Link to Snowplow Summary Dashboard.

Data Retention

Regarding retention policy of GitLab.com Snowplow events, our Snowflake data warehouse has unlimited retention. Link to Snowplow dbt. Here's what's currently in Snowflake dating back to around Aug 2018:

  • Events (Good Events): 753 GB, 3,281,848,447 rows
  • Bad Events: 256 GB, 186,793,156 rows

Go Collector

Next Steps for MVC

Next Steps for Long Term Concerns

  • Think about scalability of collector (17M events per day on GitLab.com)
  • Define retention periods of product_analytics_events table
Edited by 🤖 GitLab Bot 🤖