Add user identifier to Snowplow tracking events

Problem to solve

We record some GitLab.com analytics data using Snowplow, but we currently have a hard time tracking events across users since we don't record a unique identifier for users. If we want to understand user behavior, we need to be able to track user events across sessions and pages.

Proposal

We can either:

  • Start sending user ids in our event data, which will solve the problem immediately.
  • Hash user ids before sending them to the Snowplow endpoint. I'd imagine this would mean setting up a separate endpoint to hash the data before sending it to our DW.
    • This would be wasted effort after we stand up the collection endpoint internally.

What does success look like, and how can we measure that?

  • We have the ability to associate any event (pageview, pageping, click, etc) with a user in our DW.
    • It's not required to know anything personally identifiable about the user, only that we're able to associate a pattern of behavior with an entity. Whether they're jeremy_ or User XYZ isn't relevant.

Links / references

Edited Sep 03, 2018 by Jeremy Watson (ex-GitLab)
Assignee Loading
Time tracking Loading