Telemetry Implementation Guide

Parent issue: https://gitlab.com/gitlab-org/telemetry/issues/307

Overview

After working on telemetry for the past month, it's become apparent that we need to zoom out and think about the bigger picture of the telemetry system. The current state of telemetry is "complicated" due to several shorter-term stop gap solutions. At the time, these stop gap solutions solved the task at hand, however, it has also resulted in several edge cases and limitations which we're now trying to decoupling.

To be clear, I'm not proposing a rebuild or a large scoped project. I believe we already have the majority of the essential pieces already in place. We simply need to define how each piece should fit together.

Here are the things we need to solve for:

  • Define the roles of each tracking component:
    • snowplow javascript
    • snowplow ruby
    • system logs
    • usage ping data
    • postgres events table
    • other postgres tables
  • Reconcile differences between:
    • .com telemetry
    • self-hosted telemetry (usage ping)
  • Create a diagram of the current telemetry system design Telemetry Overview (WIP)

Related issues

Overview
  • The telemetry proposal
  • The telemetry guide
Usage ping
  • Usage ping timing out for larger instances
  • Time period in usage_data.rb
  • Add new usage ping counter for events
  • https://about.gitlab.com/handbook/product/feature-instrumentation/#process-to-add-additional-instrumentation-to-the-usage-ping
Snowplow
  • Setup Tracking QA/Testing Environment using Snowplow Mini
Logs
  • Exploring using logs for telemetry purposes
Backend action tracking
  • Exploring backend rails action tracking

Next steps

  • Write and publish telemetry implementation guide at https://docs.gitlab.com/telemetry
  • Update Usage Ping definitions https://gitlab.com/gitlab-org/telemetry/issues/267
  • Usage Ping implementation process for Stage telemetry implementations https://gitlab.com/gitlab-org/telemetry/issues/308#note_293427230
  • Snowplow implementation process using structured events
Edited Mar 06, 2020 by Jerome Z Ng
Assignee Loading
Time tracking Loading