Patroni: Upgrade to latest version 2.0.x
We currently ship Patroni 2.0.1. As of writing this issue, 2.0.2
is available and seems to provide a bunch of bugfixes and security improvements.
Version 2.0.2
New features
Ability to ignore externally managed replication slots (James Coleman)
Patroni is trying to remove any replication slot which is unknown to it, but there are certainly cases when replication slots should be managed externally. From now on it is possible to configure slots that should not be removed.
Added support for cipher suite limitation for REST API (Gunnar "Nick" Bluth)
It could be configured via
restapi.ciphers
or thePATRONI_RESTAPI_CIPHERS
environment variable.Added support for encrypted TLS keys for REST API (Jonathan S. Katz)
It could be configured via
restapi.keyfile_password
or thePATRONI_RESTAPI_KEYFILE_PASSWORD
environment variable.Constant time comparison of REST API authentication credentials (Alex Brasetvik)
Use
hmac.compare_digest()
instead of==
, which is vulnerable to timing attack.Choose synchronous nodes based on replication lag (Krishna Sarabu)
If the replication lag on the synchronous node starts exceeding the configured threshold it could be demoted to asynchronous and/or replaced by the other node. Behaviour is controlled with
maximum_lag_on_syncnode
.Stability improvements
Start postgres with
hot_standby = off
when doing custom bootstrap (Igor Yanchenko)During custom bootstrap Patroni is restoring the basebackup, starting Postgres up, and waiting until recovery finishes. Some PostgreSQL parameters on the standby can't be smaller than on the primary and if the new value (restored from WAL) is higher than the configured one, Postgres panics and stops. In order to avoid such behavior we will do custom bootstrap without
hot_standby
mode.Warn the user if the required watchdog is not healthy (Nicolas Thauvin)
When the watchdog device is not writable or missing in required mode, the member cannot be promoted. Added a warning to show the user where to search for this misconfiguration.
Better verbosity for single-user mode recovery (Alexander Kukushkin)
If Patroni notices that PostgreSQL wasn't shutdown clearly, in certain cases the crash-recovery is executed by starting Postgres in single-user mode. It could happen that the recovery failed (for example due to the lack of space on disk) but errors were swallowed.
Added compatibility with
python-consul2
module (Alexander, Wilfried Roset)The good old
python-consul
is not maintained since a few years, therefore someone created a fork with new features and bug-fixes.Don't use
bypass_api_service
when runningpatronictl
(Alexander)When a K8s pod is running in a non-
default
namespace it does not necessarily have enough permissions to query thekubernetes
endpoint. In this case Patroni shows the warning and ignores thebypass_api_service
setting. In case ofpatronictl
the warning was a bit annoying.Create
raft.data_dir
if it doesn't exists or make sure that it is writable (Mark Mercado)Improves user-friendliness and usability.
Bugfixes
Don't interrupt restart or promote if lost leader lock in pause (Alexander)
In pause it is allowed to run postgres as primary without lock.
Fixed issue with
shutdown_request()
in the REST API (Nicolas Limage)In order to improve handling of SSL connections and delay the handshake until thread is started Patroni overrides a few methods in the
HTTPServer
. Theshutdown_request()
method was forgotten.Fixed issue with sleep time when using Zookeeper (Alexander)
There were chances that Patroni was sleeping up to twice longer between running HA code.
Fixed invalid
os.symlink()
calls when moving data directory after failed bootstrap (Andrew L'Ecuyer)If the bootstrap failed Patroni is renaming data directory, pg_wal, and all tablespaces. After that it updates symlinks so filesystem remains consistent. The symlink creation was failing due to the
src
anddst
arguments being swapped.Fixed bug in the post_bootstrap() method (Alexander)
If the superuser password wasn't configured Patroni was failing to call the
post_init
script and therefore the whole bootstrap was failing.Fixed an issues with pg_rewind in the standby cluster (Alexander)
If the superuser name is different from Postgres, the
pg_rewind
in the standby cluster was failing because the connection string didn't contain the database name.Exit only if authentication with Etcd v3 explicitly failed (Alexander)
On start Patroni performs discovery of Etcd cluster topology and authenticates if it is necessarily. It could happen that one of etcd servers is not accessible, Patroni was trying to perform authentication on this server and failing instead of retrying with the next node.
Handle case with psutil cmdline() returning empty list (Alexander)
Zombie processes are still postmasters children, but they don't have cmdline()
Treat
PATRONI_KUBERNETES_USE_ENDPOINTS
environment variable as boolean (Alexander)Not doing so was making impossible disabling
kubernetes.use_endpoints
via environment.Improve handling of concurrent endpoint update errors (Alexander)
Patroni will explicitly query the current endpoint object, verify that the current pod still holds the leader lock and repeat the update.
The things that are important from the list and are related to known issues are the pg_rewind
case and things related to bootstrap and restarting services that may be related to other things we noticed.
(https://patroni.readthedocs.io/en/latest/releases.html#version-2-0-2)