[META] Snowplow Analytics for GitLab.com
Description
We've made the decision to pursue the open-source Snowplow Analytics for tracking events on GitLab.com. These events will be pushed to our data warehouse, and visualized in Looker.
This meta issue is an SSOT for the tasks needed. Our goal is to have a working pipeline by July 7th, where we're tracking a small handful of events in GitLab.com and are able to successfully visualize them in Looker.
Setup
Snowplow's guide to their pipeline is here, visualized by the following diagram:
Since we have an existing data warehouse (PostgreSQL on Cloud SQL...?), our primary concern is handling setup through subsystem 4, and getting tracked event data stored somewhere - ideally Cloud SQL, possibly S3.
1: Tracking
2: Collection
- Set up a Snowplow collector for tracking events from GitLab.com
- Self-host snowplow.js for Snowplow analytics tracking
3: Enriching
4: Storage
5: Modeling
6: Analytics
Related security review: https://gitlab.com/gitlab-com/security/issues/114
Once the cleaned and enriched data is in S3, we can ETL it into our data warehouse on a regular basis. We can then visualize it in Looker. Great success!