Draft: Spike Research: ClickHouse as the Datastore for VSA API

Problem to solve

  1. For GL VSM to be the SSOT for DevOps Analytics users need to aggregate multiple data records into one VSA.
  2. THE VSA API need to ingests raw event data from almost any DevOps tool and to normalizes it into one stream
  3. PostgreSQL is not set up for analytical workloads.

Reference use cases

(in that priority order)

  1. Custom VSA stages base on Jira events.
  2. Custom VSA stages base on Gitlab Webhooks - Add Start/End Event for Issue Assigned
  3. Expose aggregated VSA metrics for external BI tools

Investigation and clarification questions:

  1. Are there any consistency problems we might encounter if we store VSA API events in CH?Can we move the VSA table to ClickHouse?
  2. What enhancements need to be done for the VSA stage event schema?
  3. What should be the authentication and authorization approach? Assuming SaaS first.
  4. Are there any other use cases for reference?

Expected Outcome

  1. Outline (at a high level) the major steps that we need to take
  2. Technical proposal for a POC
Edited by Haim Snir