Skip to content

Add statsd and stdout logging for snowplow components

Sebastian Rehm requested to merge 13-snowplow-metrics-logging into main

Description

This adds more metrics output for the Snowplow Collector and Enricher to make it easier to debug potential misconfigurations. There seems to be no option to just log incoming requests / events, but both Collector and Enricher support periodic logging. This give us information about which events came in and which topic they were forwarded to.

The most up-to-date option to get data about incoming requests seems to be to use their StatsD integration, which can also be used with any other StatsD compatible service (Prometheus, ELK...). I set StatsD up to just forward incoming metrics to stdout for local debugging purposes.

For the enricher, both StatsD and some basic periodic stdout logging are available. Examples are provided below.

Example Outputs

Basic Enricher stdout logging

devkit-snowplow_enrich-1         | [pool-1-thread-3] INFO enrich.metrics - snowplow.enrich.raw = 27
devkit-snowplow_enrich-1         | [pool-1-thread-3] INFO enrich.metrics - snowplow.enrich.good = 27
devkit-snowplow_enrich-1         | [pool-1-thread-3] INFO enrich.metrics - snowplow.enrich.bad = 0
devkit-snowplow_enrich-1         | [pool-1-thread-3] INFO enrich.metrics - snowplow.enrich.invalid_enriched = 0
devkit-snowplow_enrich-1         | [pool-1-thread-3] INFO enrich.metrics - snowplow.enrich.latency = 707

Logging to StatsD and from there to stdout

devkit-statsd-1                  | Flushing stats at  Wed May 10 2023 09:39:20 GMT+0000 (Coordinated Universal Time)
devkit-statsd-1                  | {
devkit-statsd-1                  |   counters: {
devkit-statsd-1                  |     'statsd.bad_lines_seen': 0,
devkit-statsd-1                  |     'statsd.packets_received': 46,
devkit-statsd-1                  |     'statsd.metrics_received': 104,
devkit-statsd-1                  |     'snowplow.collector.requests_bytes;method=POST': 50813,
devkit-statsd-1                  |     'snowplow.collector.responses_bytes;status=2xx;method=POST': 66,
devkit-statsd-1                  |     'snowplow.collector.responses_duration;status=2xx;method=POST': 46,
devkit-statsd-1                  |     'snowplow.collector.requests_active;method=POST': 0,
devkit-statsd-1                  |     'snowplow.collector.responses_count;status=2xx;method=POST': 0,
devkit-statsd-1                  |     'snowplow.collector.requests_count;method=POST': 0,
devkit-statsd-1                  |     'snowplow.collector.requests_bytes;method=OPTIONS': 0,
devkit-statsd-1                  |     'snowplow.collector.responses_bytes;status=2xx;method=OPTIONS': 0,
devkit-statsd-1                  |     'snowplow.collector.responses_duration;status=2xx;method=OPTIONS': 0,
devkit-statsd-1                  |     'snowplow.collector.requests_active;method=OPTIONS': 0,
devkit-statsd-1                  |     'snowplow.collector.requests_count;method=OPTIONS': 0,
devkit-statsd-1                  |     'snowplow.collector.responses_count;status=2xx;method=OPTIONS': 0,
devkit-statsd-1                  |     'snowplow.enrich.raw;app=enrich': 27,
devkit-statsd-1                  |     'snowplow.enrich.good;app=enrich': 27,
devkit-statsd-1                  |     'snowplow.enrich.bad;app=enrich': 0,
devkit-statsd-1                  |     'snowplow.enrich.invalid_enriched;app=enrich': 0
devkit-statsd-1                  |   },
devkit-statsd-1                  |   timers: {},
devkit-statsd-1                  |   gauges: {
devkit-statsd-1                  |     'statsd.timestamp_lag': 0,
devkit-statsd-1                  |     'snowplow.enrich.latency;app=enrich': 707
devkit-statsd-1                  |   },
devkit-statsd-1                  |   timer_data: {},
devkit-statsd-1                  |   counter_rates: {
devkit-statsd-1                  |     'statsd.bad_lines_seen': 0,
devkit-statsd-1                  |     'statsd.packets_received': 4.6,
devkit-statsd-1                  |     'statsd.metrics_received': 10.4,
devkit-statsd-1                  |     'snowplow.collector.requests_bytes;method=POST': 5081.3,
devkit-statsd-1                  |     'snowplow.collector.responses_bytes;status=2xx;method=POST': 6.6,
devkit-statsd-1                  |     'snowplow.collector.responses_duration;status=2xx;method=POST': 4.6,
devkit-statsd-1                  |     'snowplow.collector.requests_active;method=POST': 0,
devkit-statsd-1                  |     'snowplow.collector.responses_count;status=2xx;method=POST': 0,
devkit-statsd-1                  |     'snowplow.collector.requests_count;method=POST': 0,
devkit-statsd-1                  |     'snowplow.collector.requests_bytes;method=OPTIONS': 0,
devkit-statsd-1                  |     'snowplow.collector.responses_bytes;status=2xx;method=OPTIONS': 0,
devkit-statsd-1                  |     'snowplow.collector.responses_duration;status=2xx;method=OPTIONS': 0,
devkit-statsd-1                  |     'snowplow.collector.requests_active;method=OPTIONS': 0,
devkit-statsd-1                  |     'snowplow.collector.requests_count;method=OPTIONS': 0,
devkit-statsd-1                  |     'snowplow.collector.responses_count;status=2xx;method=OPTIONS': 0,
devkit-statsd-1                  |     'snowplow.enrich.raw;app=enrich': 2.7,
devkit-statsd-1                  |     'snowplow.enrich.good;app=enrich': 2.7,
devkit-statsd-1                  |     'snowplow.enrich.bad;app=enrich': 0,
devkit-statsd-1                  |     'snowplow.enrich.invalid_enriched;app=enrich': 0
devkit-statsd-1                  |   },
devkit-statsd-1                  |   sets: {},
devkit-statsd-1                  |   pctThreshold: [ 90 ]
devkit-statsd-1                  | }

How to test

Use the devkit locally. Logging should happen even without incoming events (everything would be 0 in that case)

Open Questions

  1. Is the stdout logging of the enricher enough for local debugging or do we rather want the more detailed StatsD logging?
  2. How often should the logger report in either case (stdout vs StatsD)?

Closes #13 (closed)

Merge request reports