Support Siphon rollout in Staging
<!-- Before adding details, please check if there is a more focused issue template available to address your needs. --> # What is Siphon context? - Current PRR: https://gitlab.com/gitlab-com/gl-infra/readiness/-/issues/120 - Code and docs: https://gitlab.com/gitlab-org/analytics-section/siphon - Design docs: https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/siphon/ ## Who is associated with this? - Infra DRI for general support: @stejacks-gitlab - Development resources: @ahegyi and @arun.sori - DBRE support: Stay tuned, creating a request shortly. ## General Information - Related issue for context (if applicable): https://gitlab.com/gitlab-org/analytics-section/siphon/-/work_items/174 - Service this relates to (if applicable): [+ service_label +] High-level steps for the Siphon Staging rollout: 1. Connectivity: Establish connection between Siphon and the Staging DB cluster (current blocker), tracked here: https://gitlab.com/gitlab-org/analytics-section/siphon/-/work_items/174 (and we also wait for an AR: https://gitlab.com/gitlab-com/team-member-epics/access-requests/-/issues/41884) 2. Validation: Perform experimental runs to verify correctness, tracked here: https://gitlab.com/groups/gitlab-org/analytics-section/-/epics/16 3. Data Sync: Execute experimental runs with live PG-CH data synchronization. 4. Scaling: Expand configuration to all databases (Main, CI, Sec) and onboard additional tables. Current Status: We are currently stuck on Step 1. @arun.sori (Analytics) is the DRI for this task, but the proposed approach for establishing connectivity hasn't been successful (yet) - discussion is ongoing in the linked issue above. As this requires deep DBRE/SRE expertise, we cannot easily move on from this state, we get some help from DBREs depending on their availability (AFAIK there is no DRI from these teams) - we also experience long turnaround times. ## Details Right now, we're blocked on getting Siphon talking to Patroni in the staging environment. There have been a number of MRs related to this that have taken a while to get reviewed due to not having a DRI: - Primary instance MR https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/merge_requests/13052 - https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/merge_requests/12958/diffs - https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/merge_requests/12957 - https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/merge_requests/12896 - https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/merge_requests/13121 - https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/merge_requests/13198 - https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/merge_requests/13203 - https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/merge_requests/13208 Current problems are best documented in https://gitlab.com/gitlab-org/analytics-section/siphon/-/issues/174#note_3073041242. We have a very aggressive delivery date so we need someone to come up to speed and unblock this ASAP. @sabrams has nominated @ahanselka so I'm going to assign this to him. If you aren't the right person, please tell me ASAP. I'd like to see if we can get Siphon up and running by end of next week at the very latest. ## Next steps We also need to see how to make this all repeatable for the other databases (ci, sec) and production. <!-- If you know the team you need assistance from, please include the team label (uncomment and delete extra teams): /label ~"group::Networking & Incident Management" ~"NIM::Requests" /label ~"group::Runners Platform" /label ~"group::Observability" --> <!-- Please do not edit the below -->
issue