2025-08-05: Apdex SLO violation in patroni's rails_primary_sql component on main stage
Apdex SLO violation in patroni's rails_primary_sql component on main stage (Severity 3 (Medium))
Problem: The Apdex score for SQL transactions in the Patroni service on the 'main' stage experienced significant degradation due to performance issues related to database lock waits and contention.
Impact: The performance degradation impacted the rails_primary_sql SLI Apdex for 50 minutes.
Causes: Investigations have shown that heavy contention on the database LWLock caused by a recent ALTER TABLE migration with a foreign key constraint to the p_duo_workflows_checkpoints table, introduced in db/migrate/20250701233451_create_p_duo_workflows_checkpoints.rb.
Response strategy: To resolve this, we have aligned on replacing the FK constraint with our home-grown 'loose FK' mechanism, which uses delete logging and async propagation to avoid locking issues.
This ticket was created to track INC-3118, by incident.io