Skip to content
GitLab
Next
    • Why GitLab
    • Pricing
    • Contact Sales
    • Explore
  • Why GitLab
  • Pricing
  • Contact Sales
  • Explore
  • Sign in
  • Get free trial
  • GitLab.comGitLab.com
  • GitLab Infrastructure TeamGitLab Infrastructure Team
  • reliabilityreliability
  • Issues
  • #1448

crunchy-postgres-health-check

Deliverables

Health Check

  • Analyze existing configuration and architecture, provide recommendations
    • https://gitlab.com/gitlab-com/infrastructure/issues/1552
    • https://gitlab.com/gitlab-com/infrastructure/issues/1553
    • https://gitlab.com/gitlab-com/infrastructure/issues/1554
    • https://gitlab.com/gitlab-com/infrastructure/issues/1555
    • https://gitlab.com/gitlab-com/infrastructure/issues/1556
    • https://gitlab.com/gitlab-com/infrastructure/issues/1557
    • https://gitlab.com/gitlab-com/infrastructure/issues/1558
    • https://gitlab.com/gitlab-com/infrastructure/issues/1559
    • https://gitlab.com/gitlab-com/infrastructure/issues/1561
    • https://gitlab.com/gitlab-com/infrastructure/issues/1587
    • https://gitlab.com/gitlab-com/infrastructure/issues/1588
    • https://gitlab.com/gitlab-com/infrastructure/issues/1589
    • https://gitlab.com/gitlab-com/infrastructure/issues/1630
    • https://gitlab.com/gitlab-com/infrastructure/issues/1652
  • Review system statistics and identify any areas of concern
    • pending some monitoring changes listed above
  • Review schema and provide recommendations
    • https://gitlab.com/gitlab-com/infrastructure/issues/1709
  • Review PostgreSQL log files to identify any issues and recommendations
    • https://gitlab.com/gitlab-com/infrastructure/issues/1448#note_27447679

Backup and Restore

  • Review existing backup methodologies and provide recommendations:
    • https://gitlab.com/gitlab-com/infrastructure/issues/1668

PGBouncer

  • Review pgbouncer.ini and provide recommendations
    • https://gitlab.com/gitlab-com/infrastructure/issues/1448#note_26828742
  • Discuss and provide recommendations regarding PGBouncer locations/architecture
    • https://gitlab.com/gitlab-com/infrastructure/issues/1560

Read replicas

  • Review replica stats and log files
  • Determine cause of hot_standby causing table bloat:
    • likely related to issue addressed in 9.6.2
    • https://gitlab.com/gitlab-com/infrastructure/issues/1501, decision is to upgrade to 9.6.3: https://gitlab.com/gitlab-com/infrastructure/issues/1158

HA / Failover

  • Review corosync/pacemaker configuration and provide recommendations
    • depends on https://gitlab.com/gitlab-com/infrastructure/issues/1460; closed in favor of gitlab-org/omnibus-gitlab#1807 (closed) as decided to move away from pacemaker/corosync
  • Assist with understanding of corosync/pacemaker, current cluster status, managing failover
    • depends on https://gitlab.com/gitlab-com/infrastructure/issues/1460; closed in favor of gitlab-org/omnibus-gitlab#1807 (closed) as decided to move away from pacemaker/corosync

Monitoring

  • Review existing monitoring, provide recommendations for additional monitoring
    • https://gitlab.com/gitlab-com/infrastructure/issues/1448#note_27283592

Tuning / Settings

  • Review postgresql.conf and provide recommendations:
    • https://gitlab.com/gitlab-com/infrastructure/issues/1448#note_26844354
  • Review autovacuum runs
    • requires log files with autovacuum information
  • Review autovacuum settings and provide recommendations

Application binding to replicas

  • We discussed this during our calls but unclear if further discussion or review necessary.
    • https://gitlab.com/gitlab-org/gitlab-ee/issues/2042

Application improvements

  • Review slow queries for possible optimization
    • slow queries identified, representative data sample and/or system access would allow testing alternative query structures and assist in identifying possible changes in indexing strategy, see https://gitlab.com/gitlab-com/infrastructure/issues/1786)

Staging with realistic load by running tests

  • Review possibility to enable full logging, to allow collection of all queries to enable replaying on staging

Capacity planning

  • Implementation of pg_bouncer and other improvements will impact capacity planning and scalability
  • Review shared buffer utilization through pg_buffercache (requires installation) to determine memory utilization and working set size
  • Provide recommendations regarding changes to shared_buffers and system memory amounts
  • Recommendations regarding metrics for scalability and capacity planning
    • https://gitlab.com/gitlab-com/infrastructure/issues/1579
Edited Oct 02, 2017 by Yorick Peterse
Assignee
Assign to
Time tracking