Low good events record volume in snowplow

Problem

Snowplow events are not being fired from gitlab.com

We are getting Failed to load resource: net::ERR_CERT_DATE_INVALID for collector https://snowplow.trx.gitlab.net

As this is collector issue, this not only affects gitlab.com, it is affecting https://docs.gitlab.com/, https://customers.gitlab.com/ and other services that relies on that

Detection

We have received an alert on #g_analytics_instrumentation_alerts slack channel.

Impact

All the snowplow events from .com, customerDot and other applications which relies on snowplow.trx.gitlab.net for tracking are impacted.

Additional information

This was due to SSL certification expiration on snowplow collector side. Infra team has updated certificate. There was a gap of ~6hrs where we have lost the data.

Screenshot_2024-07-04_at_12.51.32_PM

Checklist

  • Assigned severity tags based on this guidance
  • Assigned to PM and EM of groupanalytics instrumentation
  • Posted link to incident in g_analyze_analytics_instrumentation and tagged both PM and EM of the group

<---- TO BE FILLED BY ASSIGNEE / RESOLUTION DRI---->

Summary

SSL certificate expiration. This has happened before as well #416991 (closed)

Root Cause

Issue was caused by certification expiration of snowplow collector. Infra incident - gitlab-com/gl-infra/production#18237 (closed)

Resolution

Reached out to SRE on call @devin who updated certificate to resolve this issue.

Edited by Ankit Panchal