Counterpart Request for GKG GA rollout

Request

What kind of support are you looking for?

What team?

Describe the feature or ongoing work that needs assistance

The analytics section is working on getting Siphon set up in staging and production as part of the Knowledge Graph GA rollout. There's a lot of context available in the handbook page and there's a pretty aggressive GA date.

Expectations for participating member(s) of the database group in the target group/project

We will need assistance with best practices for setting up Siphon, someone to work with the SRE to debug the present connectivity issues (see gitlab-com/gl-infra/production-engineering#28386).

We will also need some dedicated time to execute commands/verification checks on the postgres staging cluster. The test plan for this is on Initial Siphon test plan on staging (gitlab-org/analytics-section/siphon#175 - closed). Most of the steps can be done by ourselves but we will need someone to help with teardown steps of dropping the replication slot.

We'll need someone to complete the readiness review (gitlab-com/gl-infra/readiness#120). t sort of thing.

We're hoping to get Siphon set up in both staging and production by mid-March.

Expectations for participating member(s) of the database group in the database excellence group

Priorities for DBE member:

  • Incident Response
  • PG18 Upgrade Planning
  • This Project time-boxed to 4h/week

Time commitment:

  • Shouldn't be more than 4 hour per week.
  • Expected work is mostly reviewing operational aspects of Siphon deployment

Exit Criteria

Siphon works in all environments and does not impact the databases.

Milestone 1:

  • Siphon works on Gitlab.com staging environment with proper connectivity and doesn't affect actual database workload
    • Replication Lag on database remains under 10 minutes
    • WAL file generation doesn't exceedingly increase

Milestone 2:

  • Siphon works on Gitlab.com production environment with proper connectivity and doesn't affect actual database workload
    • Replication Lag on database remains under 10 minutes
    • WAL file generation doesn't put pressure on disks

Checklist

Requesting Team

  • The issue has a descriptive title
  • There are detailed answers to the questions above
  • The issue is assigned to the database team manager
  • If this is urgent, reach out to the team manager in slack

Database Team

  • There is enough information to prioritize the request
  • The request has been assigned to a member of the team
  • The priority of the request has been agreed by the stakeholders and author
Edited by Alex Ives