Readonly connections occasionally being used as writeable connections and failing

Events seen here: https://sentry.gitlab.net/gitlab/gitlabcom/issues/1005390/events/ e.g.:

ActiveRecord::StatementInvalid: PG::ReadOnlySqlTransaction: ERROR:  cannot execute UPDATE in a read-only transaction
: UPDATE "ci_runners" SET "version" = '12.0.1', "revision" = '0e5417a3', "platform" = 'linux', "architecture" = 'amd64', "ip_address" = 'REDACTED', "contacted_at" = '2019-10-30 20:06:21.970366' WHERE "ci_runners"."id" = 946161

From the kibana logs (not the same connection/event):

{"pg_id":"12125","xid":"0","pg_user":"gitlab","pg_db":"gitlabhq_production","pg_application":"unicorn worker[19] -D -E produ...service/gitlab-rails/config.ru","pg_client":"10.217.4.2","pg_message":"ERROR:  cannot execute UPDATE in a read-only transaction","tag":"db.postgres","environment":"gprd","hostname":"patroni-06-db-gprd","fqdn":"patroni-06-db-gprd.c.gitlab-production.internal"}

where 10.217.4.2 is one of the the read-write pgbouncers (accessed via the ILB named as db_host in gitlab.rb), and patroni-06 was at that point the master. Failover had occurred to patroni-06 about 24 hours previously.

Occurrences were heavily clustered; in kibana we see many within a few milliseconds, with occurrences separated by minutes, although this may be an artifact of the logging and there's no timestamp in the core message, so this may be a red-herring.

Edited Oct 30, 2019 by Craig Miskell
Assignee Loading
Time tracking Loading