Skip to content

Security Database Decomposition Checklist

Security Database Decomposition Checklist

This is the single source of truth for tasks that need to be completed for the Security Database Decomposition project.

Everyone feel free to open their own issues or MRs to track activities/changes, but please link them here so we can keep track of them. Feel free to update this issue with additional tasks, timeline adjustments, and links to related issues/MRs for each task to maintain a single source of truth.

Note on DRIs: I've included DRI placeholders for each section. Instead of section-level DRIs, we can assign DRIs to specific tasks based on expertise and availability. Please feel free to volunteer for specific tasks by adding your name as the DRI next to the relevant items.

Reference:

Communication and Planning [Required for decomposition]

DRI: @ghavenga @theoretick @rmar1

Issue Templates and CR Documentation [Required for execution]

DRI: @jjsisson @theoretick

  • [-] Create detailed issue template for Phase 4
  • Create detailed issue template for Phase 7 - MR gitlab-com/gl-infra/db-migration!568 (merged) - @jjsisson
  • Create Phase 4 CRs for both GSTG and GPRD using the templates
  • Create Phase 7 CRs for both GSTG and GPRD using the templates
  • Document rollback procedures and decision triggers in CR templates

Infrastructure Setup [Required for implementation] - Ref. Initial Setup

DRI: @jjsisson

  • Standup Sec cluster as a physical standby cluster:
    • GSTG
    • GPRD
  • Configure PgBouncers for Sec cluster:
    • GSTG
    • GPRD
  • Reconfigure database pool connections for Sec Cluster's Web and Sidekiq PgBouncers:
    • GSTG
    • GPRD
  • Reconfigure Sec Application to connect to Sec Cluster's Web and Sidekiq PgBouncers:
  • Reconfigure database pool connections for Main Cluster's Web and Sidekiq PgBouncers (once Sec application connect directly through the Sec PgBouncers, we need to reduce db pool connections on the Main cluster's PgBouncers):
  • Validate physical replication between Main and Sec clusters:
    • GSTG
    • GPRD
  • Configure Consul services appropriately for both clusters:
    • GSTG
    • GPRD

Monitoring and Observability [Required for validation]

DRI:

  • Build monitoring dashboards for Sec Decomposition process: - gitlab-com/runbooks!8562 (merged) - @jjsisson
    • Replication lag monitoring for both GSTG and GPRD
    • Traffic distribution metrics before/during/after switchover
  • Set up Patroni and PgBouncer dashboards for Sec cluster (mirroring Main cluster dashboards)
  • Configure appropriate alerts for the decomposition process

Ansible Playbooks [Required for automation]

DRI: @jjsisson

  • Build Ansible Playbooks for physical-to-logical replication conversion
  • Build Ansible Playbooks for Read and RW traffic switchover procedures
  • Build Ansible Playbooks for rollback procedures
  • Build Ansible Playbooks for write-lock trigger implementation (if applicable)
  • Document all playbooks with usage instructions and parameters

db-benchmarking Environment Setup [Required for testing]

DRI:

  • Create Mock Main cluster in db-benchmarking - Setup db-benchmarking Environment - @jjsisson
  • Create physical standby Sec cluster in db-benchmarking
  • Configure PgBouncers for Sec cluster
  • Set up Consul services for both clusters

End-to-End Testing in the db-benchmarking Environment [Required before Phase 7]

DRI: @jjsisson @bshah11 @theoretick

  • Validate Ansible playbooks for physical-to-logical replication conversion
  • Test write-lock trigger implementation (SQL statement block or Ansible playbook)
  • Validate Ansible playbooks for Sec cluster PgBouncer reconfiguration operations (Pause, reconfigure, Reload, Resume)
  • Validate Ansible playbooks for Read and RW traffic switchover (simulated without actual site traffic)
  • Validate Ansible playbooks for rollback procedures
  • Verify monitoring dashboards function correctly during simulated operations
  • Document any issues encountered and their resolutions for production implementation

Phase 4: Validate Read Traffic Switchover

Staging (GSTG) Validation - Phase 4 [Required before Production]

DRI: @jjsisson @theoretick

  • Create Application MR for redirecting read traffic to Sec cluster
  • Execute Phase 4 GSTG CR:
  • Document findings and update GPRD plan based on GSTG experience @theoretick

Production (GPRD) Execution - Phase 4

DRI: @jjsisson @theoretick

  • Create Application MR for redirecting read traffic to Sec cluster
  • Execute Phase 4 GPRD CR:
  • Document findings and update Phase 7 plan based on Phase 4 experience

Phase 7: Production Sec Decomposition and Cutover

Staging (GSTG) Execution - Phase 7 [Required before Production]

DRI:

  • Prepare necessary MRs:
    • Application MR for redirecting read traffic to Sec cluster
    • Write-lock triggers MR for Security tables in Main cluster (if not handled via Ansible)
    • Chef MR to persist PgBouncer configuration changes to master.patroni-sec
    • Additional MRs (monitoring dashboards, Teleport access, etc.)
  • Execute Phase 7 GSTG CR:
    • Convert Physical to Logical replication
    • Read Traffic Switchover to Sec cluster
    • RW Traffic Switchover to Sec cluster
    • Establish reverse logical replication
    • Monitor application functionality and performance
  • Document findings and update GPRD plan based on GSTG experience

Production (GPRD) Execution - Phase 7

DRI:

  • Prepare necessary MRs:
    • Application MR for redirecting read traffic to Sec cluster
    • Write-lock triggers MR for Security tables in Main cluster (if not handled via Ansible)
    • Chef MR to persist PgBouncer configuration changes to master.patroni-sec
    • Terraform/Chef MR for Analytics team to use Sec cluster's disk snapshot
    • Additional MRs (monitoring dashboards, Teleport access, etc.)
  • Determine if Delayed DR cluster is needed for Sec cluster and prepare accordingly
  • Execute Phase 7 GPRD CR:
    • Convert Physical to Logical replication
    • Read Traffic Switchover to Sec cluster
    • RW Traffic Switchover to Sec cluster
    • Establish reverse logical replication
    • Monitor application functionality and performance

Post-Decomposition Tasks

DRI:

Edited by Lucas Charles