Skip to content
Snippets Groups Projects
Closed Tweaking (decreasing) idle_in_transaction_session_timeout on Production
  • View options
  • Tweaking (decreasing) idle_in_transaction_session_timeout on Production

  • View options
  • Closed Issue created by Emanuel Calvo

    Production Change - Criticality 1 C1

    Change Objective Describe the objective of the change
    Change Type idle_in_transaction_session_timeout needs to be decreased in order to terminate long transactions within long idle in transaction state. Currently sits in 1min, which is relatively high considering the high traffic.
    Services Impacted patroni-*
    Change Team Members @ahmadsherif
    Change Severity C1
    Buddy check @emanuel_ongres
    Tested in staging
    Schedule of the change Date and time (with timezone)
    Duration of the change This variable has user scope, meaning that set will apply effectively. Although, if persisted in configuration, a SIGHUP is necessary for refreshing it across all active sessions
    Downtime Component No downtime required
    Detailed steps for the change. Each step must include: See bellow

    General

    Currently, idle_in_transaction_session_timeout is set to 1min which is a large value for current production traffic. @abrandl recommended to reduce to 15s, although, it may probably be too short and could generate undesired cancelled jobs. So, we are going to decrease this gradually until we see that we get cancelled large runs, without cancelling too many jobs. That is, first iteration will be set it at 30s and, if no perceptible changes (that is, long jobs never get targeted by this threshold, consequently no action) it can be decreased to its half 15s.

    Prerequisites

    • All Patroni nodes must be in a healthy state and relatively short replication lag (under 100MB delay).
    • Full list of all nodes (replicas first, leader at the end).

    Execution Steps

    • alter system set idle_in_transaction_session_timeout = '30s' in the first node of the list.
    • Monitor errors and verify the service messages. If no errors, change is safe.
    • Iterate the change across all replicas, with its corresponding verification.
    • Persist the configuration on the configuration file (*).

    Once the change is applied to all replicas, the leader is the next:

    • alter system set idle_in_transaction_session_timeout = '30s' in the leader
    • Verify if sessions are being killed and monitor any other related issues through tail -vf /var/log/gitlab/postgresql/postgresql.log.

    Verification Steps

    Once the idle_in_transaction_session_timeout is set to 30s, there should be visible the impact when large transactions occur. If no transaction is being killed when large activity issued, it might be necessary to reduce it to its half, by using the same procedure above.

    Related Issues

    PgBouncer Notes

    Currently, PgBouncer supports idle_transaction_timeout and it is currently disabled (0) in write-pools. Probably, as this is write issue and to avoid changes on Patroni cluster, it is possible to apply this change at PgBouncer side, avoiding the execution all over the Patroni fleet. Although, this is only applicable for the writing transactions (pgbouncer-02/03). It only requires SIGHUP and this value is in seconds.

    (*) Step require to be reviewed and commented by DBRE

    0 of 8 checklist items completed · Edited by Gerardo Lopez-Fernandez

    Linked items 0

  • Link items together to show that they're related or that one is blocking others.

    Activity

    • All activity
    • Comments only
    • History only
    • Newest first
    • Oldest first
    Loading Loading Loading Loading Loading Loading Loading Loading Loading Loading