Skip to content

recovery.conf in GSTG needing recovery_target_timeline setting?

From #358 (comment 70387124), we fixed a replication issue by setting recovery.conf to use:

recovery_target_timeline = 'latest'

Couple of questions:

  1. Does this need to be set in our GSTG environment? @ahanselka
  2. Does this setting need to be in Omnibus/Geo? @ibaum, @abrandl

Background:

@dbalexandre and I noticed that postgres-01 on GSTG was not replicating properly, and it looks like Wal-E was looking for a file that did not exist:

2018-05-09_19:09:12.30035 postgres-01-db-gstg postgresql: wal_e.operator.backup INFO     MSG: begin wal restore
2018-05-09_19:09:12.30148 postgres-01-db-gstg postgresql:         STRUCTURED: time=2018-05-09T19:09:12.299546-00 pid=16610 action=wal-fetch key=s3://gitlab-dbstg-backups/postgres02/wal_005/0000000600002D8100000025.lzo prefix=postgres02/ seg=0000000600002D8100000025 state=begin
2018-05-09_19:09:12.58194 postgres-01-db-gstg postgresql: gpg: decrypt_message failed: Unknown system error
2018-05-09_19:09:12.58326 postgres-01-db-gstg postgresql: lzop: <stdin>: not a lzop file
2018-05-09_19:09:12.58456 postgres-01-db-gstg postgresql: wal_e.blobstore.s3.s3_util INFO     MSG: could no longer locate object while performing wal restore
2018-05-09_19:09:12.58510 postgres-01-db-gstg postgresql:         DETAIL: The absolute URI that could not be located is s3://gitlab-dbstg-backups/postgres02/wal_005/0000000600002D8100000025.lzo.
2018-05-09_19:09:12.58558 postgres-01-db-gstg postgresql:         HINT: This can be normal when Postgres is trying to detect what timelines are available during restoration.
2018-05-09_19:09:12.58605 postgres-01-db-gstg postgresql:         STRUCTURED: time=2018-05-09T19:09:12.584213-00 pid=16610
2018-05-09_19:09:12.58824 postgres-01-db-gstg postgresql: wal_e.operator.backup INFO     MSG: complete wal restore
2018-05-09_19:09:12.58877 postgres-01-db-gstg postgresql:         STRUCTURED: time=2018-05-09T19:09:12.588003-00 pid=16610 action=wal-fetch key=s3://gitlab-dbstg-backups/postgres02/wal_005/0000000600002D8100000025.lzo prefix=postgres02/ seg=0000000600002D8100000025 state=complete

This was similar to #358 (closed), so we added recovery_target_timeline = 'latest' to recovery.conf. After restarting, that made things worse:

2018-05-10_19:50:33.47904 postgres-01-db-gstg postgresql: gpg: Sorry, we are in batchmode - can't get input
2018-05-10_19:50:33.47936 postgres-01-db-gstg postgresql: lzop: <stdin>: not a lzop file
2018-05-10_19:50:33.57875 postgres-01-db-gstg postgresql: wal_e.blobstore.s3.s3_util WARNING  MSG: retrying WAL file fetch from unexpected exception
2018-05-10_19:50:33.57885 postgres-01-db-gstg postgresql:         DETAIL: The exception type is <class 'wal_e.exception.UserCritical'> and its value is CRITICAL: MSG: pipeline process did not exit gracefully
2018-05-10_19:50:33.57886 postgres-01-db-gstg postgresql:         DETAIL: "gpg2 -d -q --batch --pinentry-mode loopback" had terminated with the exit status 2.
2018-05-10_19:50:33.57888 postgres-01-db-gstg postgresql:         STRUCTURED: time=2018-05-10T19:50:33.578388-00 pid=27908 and its traceback is   File "/opt/wal-e/lib/python3.5/site-packages/wal_e/retries.py", line 62, in shim
2018-05-10_19:50:33.57889 postgres-01-db-gstg postgresql:             return f(*args, **kwargs)
2018-05-10_19:50:33.57890 postgres-01-db-gstg postgresql:           File "/opt/wal-e/lib/python3.5/site-packages/wal_e/blobstore/s3/s3_util.py", line 139, in download
2018-05-10_19:50:33.57892 postgres-01-db-gstg postgresql:             raise
2018-05-10_19:50:33.57893 postgres-01-db-gstg postgresql:           File "/opt/wal-e/lib/python3.5/site-packages/wal_e/pipeline.py", line 115, in __exit__
2018-05-10_19:50:33.57894 postgres-01-db-gstg postgresql:             command.finish()
2018-05-10_19:50:33.57895 postgres-01-db-gstg postgresql:           File "/opt/wal-e/lib/python3.5/site-packages/wal_e/pipeline.py", line 204, in finish
2018-05-10_19:50:33.57896 postgres-01-db-gstg postgresql:             .format(" ".join(self._command), retcode))
2018-05-10_19:50:33.57897 postgres-01-db-gstg postgresql:           There have been 3879 attempts to fetch wal file s3://gitlab-dbstg-backups/postgres02/wal_005/00000007.history.lzo so far.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information