Add user identifier to Snowplow tracking events
Problem to solve
We record some GitLab.com analytics data using Snowplow, but we currently have a hard time tracking events across users since we don't record a unique identifier for users. If we want to understand user behavior, we need to be able to track user events across sessions and pages.
Proposal
We can either:
- Start sending user ids in our event data, which will solve the problem immediately.
- Hash user ids before sending them to the Snowplow endpoint. I'd imagine this would mean setting up a separate endpoint to hash the data before sending it to our DW.
- This would be wasted effort after we stand up the collection endpoint internally.
What does success look like, and how can we measure that?
- We have the ability to associate any event (pageview, pageping, click, etc) with a user in our DW.
- It's not required to know anything personally identifiable about the user, only that we're able to associate a pattern of behavior with an entity. Whether they're
jeremy_orUser XYZisn't relevant.
- It's not required to know anything personally identifiable about the user, only that we're able to associate a pattern of behavior with an entity. Whether they're
Links / references
Edited by Jeremy Watson (ex-GitLab)