Producer-only Siphon deployment
## Overview
[Siphon](gitlab.com/gitlab-org/analytics-section/siphon) is an in-house Change Data Capture (CDC) solution that replicates data from PostgreSQL to other data stores. This pipeline enables us to implement significant performance and cost optimizations within GitLab, while also providing a foundation for new product capabilities.
- Eliminate the data team’s PG replicas (cost reduction).
- Offload log data from PostgreSQL to a cheaper data store (storage cost reduction).
- Delegate expensive database queries to our analytical database, ClickHouse (improved query performance).
- Implement new, AI-powered features (Knowledge Graph).
- Enable new analytical capabilities like hierarchical queries, query by team, enabling improved insight into usage of GitLab for customers.
## Goals of this Epic
- Siphon producer (in a staging environment) is deployed and starts up without an active PostgreSQL connection.
- It enters a crash loop, confirming the deployment is functioning as expected.
- Siphon producer (in a staging environment) is deployed and successfully connects to the staging PostgreSQL instance, running for an extended duration.
- Siphon producer is connected to a PostgreSQL replica.
- Validate full end-to-end behavior.
- Establish monitoring and observability.
- Siphon producer (in a production environment) is deployed and successfully connects to the production PostgreSQL instance, running for a short time.
- Siphon producer is connected to a DR instance.
- Verify that the producer can efficiently consume the logical replication stream.
## High-level steps
1. Create the database users using the recently created [script](https://gitlab.com/gitlab-com/gl-infra/data-access/dbo/dbo-issue-tracker/-/issues/449).
2. Store the secrets so the Siphon producer can reach them.
3. Create a `PUBLICATION` on the PostgreSQL server.
4. Siphon producer connects to PostgreSQL.
5. Monitor Siphon logs and the prometheus metrics.
6. Monitor the replication slot (lag).
### PG Publication and replication slot naming
- Publication: `$ENV_$DB_siphon_(publication|slot)_1`
- Example: `prd_main_siphon_publication_1`
- Replication slot:
- Example: `prd_main_siphon_slot_1`
epic