Draft: Adds vector transformation for snowplow_bad_events

Surabhi Suman requested to merge vector-transform-events into main

What does this MR do and why?

This removes PII data from snowplow_bad_events

Sample payload before/after vector transformation

Before After
	"data": {
		"failure": {
			"messages": [{}],
			"timestamp": ""
		"payload": {
	      "raw": {
		      	"contentType": "application/json",
		        "encoding": "UTF-8",
		        "loaderName": "ssc-2.8.2-kafka",
		        "ipAddress": "",
		        "userId": ""
		        "parameters": [{}]
	      "enriched": {

	"data": {
		"failure": {
			"messages": [{

			"timestamp": ""
		"payload": {
	      "raw": {
		      	"contentType": "application/json",
		        "encoding": "UTF-8",
		        "loaderName": "ssc-2.8.2-kafka",
		        "parameters": [{}]

How to set up and validate locally

  1. Run helm upgrade -f custom.values.yaml [RELEASE_NAME] for your cluster.

  2. Generate a bad event in snowplow by sending an incorrect payload. Replace #{collector host} and #{appId} with your application's collector host and appId.

    POST /com.snowplowanalytics.snowplow/tp2 HTTP/1.1
    Host: #{collector host}
    accept: */*
    accept-language: en-GB,en-US;q=0.9,en;q=0.8
    content-type: application/json; charset=UTF-8
    origin: null
    sec-ch-ua: "Google Chrome";v="123", "Not:A-Brand";v="8", "Chromium";v="123"
    sec-ch-ua-mobile: ?0
    sec-ch-ua-platform: "macOS"
    sec-fetch-dest: empty
    sec-fetch-mode: cors
    sec-fetch-site: cross-site
    sp-anonymous: *
    user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537.36
    x-gitlab-appid: #{appId}
        "schema": "iglu:com.snowplowanalytics.snowplow/payload_data/jsonschema/1-0-4",
        "data": [
                "e": "pv",
                "page": "Test Page",
                "eid": "any-string",
                "tv": "js-3.12.0",
                "tna": "gitlab",
                "aid": "#{appId}",
                "p": "web",
                "cs": "UTF-8",
                "lang": "en-GB",
                "res": "2560x1440"
  3. Check clickhouse table default.snowplow_bad_events . schema should not have enriched payload, user id or ip address.

