Low good events record volume in snowplow
Problem
Snowplow events are not being fired from gitlab.com
We are getting Failed to load resource: net::ERR_CERT_DATE_INVALID for collector https://snowplow.trx.gitlab.net
As this is collector issue, this not only affects gitlab.com, it is affecting https://docs.gitlab.com/, https://customers.gitlab.com/ and other services that relies on that
Detection
We have received an alert on #g_analytics_instrumentation_alerts slack channel.
Impact
All the snowplow events from .com, customerDot and other applications which relies on snowplow.trx.gitlab.net for tracking are impacted.
Additional information
This was due to SSL certification expiration on snowplow collector side. Infra team has updated certificate. There was a gap of ~6hrs where we have lost the data.
Checklist
-
Assigned severity tags based on this guidance -
Assigned to PM and EM of groupanalytics instrumentation -
Posted link to incident in g_analyze_analytics_instrumentationand tagged both PM and EM of the group
<---- TO BE FILLED BY ASSIGNEE / RESOLUTION DRI---->
Summary
SSL certificate expiration. This has happened before as well #416991 (closed)
Root Cause
Issue was caused by certification expiration of snowplow collector. Infra incident - gitlab-com/gl-infra/production#18237 (closed)
Resolution
Reached out to SRE on call @devin who updated certificate to resolve this issue.
