Skip to content

Draft: Adds vector transformation for snowplow_bad_events

Surabhi Suman requested to merge vector-transform-events into main

What does this MR do and why?

This removes PII data from snowplow_bad_events

Relates to: https://gitlab.com/gitlab-org/analytics-section/product-analytics/analytics-stack/-/issues/176

Screenshots or screen recordings

Sample payload before/after vector transformation

Before After
{
	"data": {
		"failure": {
			"messages": [{}],
			"timestamp": ""
		},
		"payload": {
	      "raw": {
		      	"contentType": "application/json",
		        "encoding": "UTF-8",
		        "loaderName": "ssc-2.8.2-kafka",
		        "ipAddress": "",
		        "userId": ""
		        "parameters": [{}]
     		 },	
	      "enriched": {

	      }
	}
}
{
	"data": {
		"failure": {
			"messages": [{

			}],
			"timestamp": ""
		},
		"payload": {
	      "raw": {
		      	"contentType": "application/json",
		        "encoding": "UTF-8",
		        "loaderName": "ssc-2.8.2-kafka",
		        "parameters": [{}]
     		 }
	}
}

How to set up and validate locally

  1. Run helm upgrade -f custom.values.yaml [RELEASE_NAME] for your cluster.

  2. Generate a bad event in snowplow by sending an incorrect payload. Replace #{collector host} and #{appId} with your application's collector host and appId.

    POST /com.snowplowanalytics.snowplow/tp2 HTTP/1.1
    Host: #{collector host}
    accept: */*
    accept-language: en-GB,en-US;q=0.9,en;q=0.8
    content-type: application/json; charset=UTF-8
    origin: null
    sec-ch-ua: "Google Chrome";v="123", "Not:A-Brand";v="8", "Chromium";v="123"
    sec-ch-ua-mobile: ?0
    sec-ch-ua-platform: "macOS"
    sec-fetch-dest: empty
    sec-fetch-mode: cors
    sec-fetch-site: cross-site
    sp-anonymous: *
    user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36
    x-gitlab-appid: #{appId}
    
    {
        "schema": "iglu:com.snowplowanalytics.snowplow/payload_data/jsonschema/1-0-4",
        "data": [
            {
                "e": "pv",
                "page": "Test Page",
                "eid": "any-string",
                "tv": "js-3.12.0",
                "tna": "gitlab",
                "aid": "#{appId}",
                "p": "web",
                "cs": "UTF-8",
                "lang": "en-GB",
                "res": "2560x1440"
            }
        ]
    }
  3. Check clickhouse table default.snowplow_bad_events . schema should not have enriched payload, user id or ip address.

MR acceptance checklist

  • The correct type labels have been applied to this MR.
  • This MR has been made as small as possible, to improve review efficiency and code quality.
  • This MR has been self-reviewed per the code review guidelines.
  • The changes have undergone manual testing and are functioning as intended.
  • This MR has updated the Chart.yaml version number following SemVer versioning practices.
  • This MR documents any breaking changes in the MR description, and the upgrade path has been documented in the first commit as well as in MR description.

How to set up and validate

Numbered steps to set up and validate the change are strongly suggested.

How to deploy upon merging

Numbered steps to explain how this change needs to be deployed. For instance, if there are any changes that should be made outside of the code changes themselves.

Edited by Surabhi Suman

Merge request reports