Skip to content

Set up a Snowplow collector for tracking events on GitLab.com

Description

We'd like to use Snowplow for tracking pageviews and events on GitLab.com. In Snowplow, trackers fire events, which are received and logged by collectors. Trackers send data to collectors by making a GET request for a tracking pixel.

This issue tracks the creation of a collector, which will merrily log these requests. We can't do tracking without a collector.

Proposal

Snowplow collectors use a tracking pixel and log GET requests for the pixel. To stand up a collector, we need to do two things:

  • Decide on the collector we'd like to use. The 3 collector options are described here, with the Cloudfront Collector being the most commonly used. Seems the Scala Stream Collector is stable as well and recommended for future usage.

    • Using CloudFront Collector place the logs to S3
    • Using the Scala Stream Collector push the logs to a Kinesis Stream
  • Setup the collector, as described in the installation guide.

    • We should probably set up at least 2 collector groups: one pixel/logs for GitLab.com production and one for staging/testing.
    • On AWS compute optimized instances preferred
    • We should have at least 3-4 instances to increase log/shard parallelization

Links / references

Edited by Tamas Szuromi