pg_dump failing on GitLab instances with multiple PG read-replicas
Summary
GitLab backup uses the pg_dump
command to take a backup of the PostgreSQL database. Since 15.11, a number of customers running Linux package (Omnisbus) PostgreSQL with multiple read replicas have reported that this command is failing.
There have been types of failure
-
PQsocket() error
ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor
-
PQconsumeInput() error
ActiveRecord::StatementInvalid: PG::ConnectionBad: PQconsumeInput() server closed the connection unexpectedly
Steps to reproduce
- Create environment on 15.11.13 using HA Postgres
- Ensure the DB size is around 5 GB in size
- Whilst the issue doesn't seem related to a specific size of DB, it can occur intermittently. As such the more data in the DB the easier it is to recreate.
- Run the backup tool from a rails node
- Ensure the node is configured to point at the Patroni leader and not via PGBouncer.
Example Project
What is the current bug behavior?
The backup fails with one of the following errors:
-
PQsocket() error
ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor
-
PQconsumeInput() error
ActiveRecord::StatementInvalid: PG::ConnectionBad: PQconsumeInput() server closed the connection unexpectedly
In testing, whilst running on 15.11.13
the error ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor
would appear somewhat intermittently, the backup would pass at first then seemed to begin failing more consistently.
After upgrading to the latest nightly 16.2.4+rnightly.295461.af8a91b9-0
the error consitently changed to ActiveRecord::StatementInvalid: PG::ConnectionBad: PQconsumeInput() server closed the connection unexpectedly
What is the expected correct behavior?
The pg_dump
portion of the backup completes successfully without errors.
Relevant logs and/or screenshots
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true
)(we will only investigate if the tests are passing)
Possible fixes
Workaround
Use pg_dump
directly