Skip to content

Pseudo anonymize all record ids in experiment tracking

Jeremy Jackson requested to merge jejacks0n/pseudo-anonymize-ids into master

What does this MR do?

This MR pseudo anonymizes (using a one way seeded hashing strategy) any record passed to tracking via experimentation. It uses the same strategy that's used in anonymizing the experiment context key, which has been approved and in use for a while.

In our Snowplow Documentation it states that namespace, project and user are valid arguments for Gitlab::Tracking.event. But these arguments are then passed to, and ignored by the Gitlab::Tracking::StandardContext -- this MR bypasses that missing logic and instead anonymizes these values before passing them downstream.

A reasonable amount of effort and education has been put into gitlab-experiment in terms of minimizing the need to link things to the wider dataset, but we continue to see this be a struggle in reporting and generating deep data about experiments. This MR is a proposal of how, as engineers that are doing our best to make product happy, and also be respectful of GDPR rules, our privacy policy, and commitments made to the community, we might approach this issue -- that for performance reasons doesn't involve writing these things to the database, as is currently happening.

Screenshots (strongly suggested)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Jeremy Jackson

Merge request reports