Push failure signatures to ClickHouse

Problem Statement

Failure category and CI failure signature data needs to be stored in ClickHouse to enable large-scale analytics, dashboards, and correlation analysis for master-broken incident detection. Currently, we push failure categories via internal events to Snowflake, but this should be complemented with a more direct ClickHouse pipeline.

Goal

Add a ClickHouse data pipeline for failure category and signature information alongside the current internal events to Snowflake push. This will allow us to see the signatures we generate on the latest data (although we could also do this via Snowflake exports with 1-2 days delay).

Acceptance Criteria

  • Design ClickHouse schema for failure categories and signatures
  • Implement data pipeline to push signature data to ClickHouse
  • Implement dual write to both ClickHouse and internal events to Snowflake
  • Validate data quality and accessibility in ClickHouse

Benefits

  • Real-time analysis: See signatures on latest incoming data instead of waiting for Snowflake exports
  • Better performance: Direct ClickHouse queries vs internal events processing
  • Dual data availability: Keep existing Snowflake flow while adding ClickHouse capabilities

Related Work

Builds on gitlab-org/quality/triage-ops!3654 (merged)

Edited by David Dieulivol