Skip to content

Counter metric to track query retries in open transaction

Andreas Brandl requested to merge ab/query-retry-metrics into master

What does this MR do and why?

This adds a metric to the database load balancer to track query retries that may be problematic. We add additional dimensions to distinguish cases (inside_transaction, inside_dirty_transaction).

Background in #220242 (comment 684439980)

How to set up and validate locally

I've been using the following scripts to play around with this a bit:

The "pathology":

require File.expand_path('../../config/environment', __FILE__)

ActiveRecord::Base.logger = Logger.new(STDOUT)

conn = Gitlab::Database::LoadBalancing.proxy
#conn = ActiveRecord::Base.connection

conn.execute(<<~SQL)
  SET idle_in_transaction_session_timeout='1s';
  DROP TABLE IF EXISTS foo;
  CREATE TABLE foo (id serial primary key, value integer);
SQL

conn.transaction do
  conn.insert("insert into foo (value) VALUES (1)")
  sleep(2) # transaction times out
# This will run into a PG error, which is not raised.
  # Instead, we retry the insert on a fresh connection
  
  # and hence this violates transaction semantics.
  conn.insert("insert into foo (value) VALUES (2)")
end

puts conn.select_all('select * from foo').to_a.inspect

See what happens if we restart postgres (severing the connection):

require File.expand_path('../../config/environment', __FILE__)

ActiveRecord::Base.logger = Logger.new(STDOUT)

conn = Gitlab::Database::LoadBalancing.proxy

binding.pry   # restart postgres

conn = Gitlab::Database::LoadBalancing.proxy

conn.transaction do
  conn.insert("insert into foo (value) VALUES (1)")
end

puts conn.select_all('select * from foo').to_a.inspect

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Andreas Brandl

Merge request reports