Skip to content

Adds vector transformation for snowplow bad events

Surabhi Suman requested to merge ss/vector-bad-events into main

What does this MR do and why?

This removes PII data from snowplow_bad_events

Relates to: https://gitlab.com/gitlab-org/analytics-section/product-analytics/analytics-stack/-/issues/176

Screenshots or screen recordings

Sample payload before/after vector transformation

Before After
{
	"data": {
		"failure": {
			"messages": [{}],
			"timestamp": ""
		},
		"payload": {
	      "raw": {
		      	"contentType": "application/json",
		        "encoding": "UTF-8",
		        "loaderName": "ssc-2.8.2-kafka",
		        "ipAddress": "",
		        "userId": ""
		        "parameters": [{}]
     		 },	
	      "enriched": {

	      }
	}
}
{
	"data": {
		"failure": {
			"messages": [{

			}],
			"timestamp": ""
		},
		"payload": {
	      "raw": {
		      	"contentType": "application/json",
		        "encoding": "UTF-8",
		        "loaderName": "ssc-2.8.2-kafka",
		        "parameters": [{}]
     		 }
	}
}

How to set up and validate locally

  1. Run docker compose up
  2. Generate a bad event in snowplow by sending an incorrect payload.
  3. Check clickhouse table default.snowplow_bad_events . schema should not have enriched payload, user id or ip address.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Surabhi Suman

Merge request reports