Spike: Self-Managed Event Tracking
Moved to https://gitlab.com/gitlab-org/telemetry/-/issues/383
This issue is to come up with a solution that allows event tracking for UI and CRUD/API events on self-managed instances.
Related investigation:
- Spike: Hybrid Logs and Postgres Events w/ Usage Ping
- Spike: Log parser and using intermediate tables for usage ping
- Product Instrumentation - MVC - SnowPlow visualization specifically reusing the Snowplow Go Collector
Requests for this functionality:
Proposal
Current Architecture
Future architecture:
- The vision for self-managed event tracking is to implement a Snowplow collector that is fully contained within a self-managed instance
- Snowplow JS and Snowplow Ruby events will be sent to the Snowplow Collector, the Snowplow events will be written to Postgres (either the existing database or an analytics database), and we’ll use Usage Ping to aggregate this event data so it can be sent back to the Versions application.
Next Steps:
-
Look into Snowplow Go Collector -
Think about scalability (15M events per day on GitLab.com) -
Think about storing event data and if it should be mixed with transactional data https://gitlab.com/gitlab-org/telemetry/-/issues/333#note_317382869 -
To ease scalability challenges, possibly separate GitLab.com and self-managed Snowplow events -
Define retention periods -
Define intermediate tables https://gitlab.com/gitlab-org/telemetry/-/issues/360#note_326332422
Edited by Jerome Z Ng
