Readiness review document for Siphon
About
Siphon is an in-house developed application which delivers serialized CDC (change data capture) data from the PostgreSQL logical replication stream to a pub-sub system. From the pub-sub system consumers (fan out) can process the data and ingest it into other database systems (e.g. ClickHouse, Snowflake).
Scope
This readiness review is for deploying the Siphon producer and consumer applications on .com (target dev/staging initially). In the document we assume that we have a working NATS cluster which is covered by this readiness review
From the Siphon design document
Producer application
- 1 single-binary + 1 yaml config
- Connection requirements:
- Connection to a PostgreSQL replica
- Connection to a PostgreSQL replica with
hot_standby_feedback=offsetting off (see the snapshot process for context) - Connection to NATS
Consumer application 1 (ClickHouse)
- 1 single-binary + 1 yaml config
- Connection requirements:
- Connection to NATS
- Connection to ClickHouse Cloud database (already exist)
Consumer application 2 (Iceberg/Snowflake)
- 1 single-binary + 1 yaml config
- Connection requirements:
- Connection to NATS
- Connection to an Apache Iceberg instance (provided by @vedprakash2021)
Helm charts/terraform:
- Helm charts are available here: https://gitlab.com/gitlab-org/analytics-section/siphon/-/tree/main/helm/siphon?ref_type=heads
- Siphon is running in our sandbox env using this terraform project: https://gitlab.com/gitlab-org/analytics-section/platform-insights/data-insights-platform-sandbox/
Edited by Adam Hegyi