Reconfigure keycloak postgresql DB (!5360) · Merge requests · Sylva-projects / sylva-core

What does this MR

In order to remove keycloak postgresql DB WAL PVC (#2773 (closed)) and to reduce the overall size of the DB (#2650 (closed)), we create a new keycloak-postgresql unit that configures a new cluster and imports data from the previous database.

In order to perform the transition between the 2 units, we define a condition (_internal.use_keycloak_postgresql) that is true:

if the migration from keycloak-postgres unit to keycloak-postgresql unit has been completed
or if former keycloak-postgres unit has never been installed (in order to install the new keycloak-postgresql unit in fresh installs)

And use this condition to disable the older keycloak-postgres unit when applicable.

During the development of this feature, I observed cases where the keycloack-postgresql kustomization became Ready whereas the underlying postgres cluster was not, this is caused by cnpg operator that does not expose a status compatible with standard kstatus helathcheck. That's why I'm adding helthCheckExpr (since I'm using kustomization's status.observedGeneration != -1 as a proof that some unit has already become ready at least once, we need an accurate status to delete the old DB only when the new one has properly imported the data)

These healthCheckExprs could also be added to the kunai cnpg DB (to do in a separate MR) and could maybe be backported in release-1.4 (#2789)

Since extra WAL PVC is not present in previous sylva release (1.4), we add a condition to avoid configuring the extra WAL PVC prior to migrate to keycloak-postgresql (but keep it if it is already configured to handle the upgrade of platforms that are following the main branch)

Which DB parameters are changed

We're changing following settings compared to previous cluster definition:

The dedicated WAL PVC is removed
Storage PVC size is decreased from 20Go to 5Go (20Go was probably bit over-sized, but 2Go was probably a bit too small to accommodate with WAL size growth when cluster replication is interrupted).
We increase checkpoint_timeout from 5mn to 120mn as proposed in !5343 (closed) (local testing seems to indicate that this address the problem discussed ub #2648, the future will confirm it or not)
Let CNPG operator manage the PDBs instead of generating the PDBs in the kustomization (see !5360 (comment 2723925691))

In order to simplify the current MR and ease the future cleanup of keycloak-postgres unit, I chose to duplicate the kustomize-units directory instead of using patches, components and substitutions to share manifests between the two units.

Related reference(s)

Closes: #2773 (closed)

Closes: #2650 (closed)

Related to #2648 (possibly closes it)

Test coverage

CI configuration

Below you can choose test deployment variants to run in this MR's CI.

Click to open to CI configuration

Legend:

Icon	Meaning	Available values
☁️	Infra Provider	`capd`, `capo`, `capm3`
🚀	Bootstrap Provider	`kubeadm` (alias `kadm`), `rke2`, `okd`, `ck8s`
🐧	Node OS	`ubuntu`, `suse`, `na`, `leapmicro`
🛠️	Deployment Options	`light-deploy`, `dev-sources`, `ha`, `misc`, `maxsurge-0`, `logging`, `no-logging`
🎬	Pipeline Scenarios	Available scenario list and description

Global config for deployment pipelines

autorun pipelines
allow failure on pipelines
record sylvactl events

Notes:

Enabling autorun will make deployment pipelines to be run automatically without human interaction
Disabling allow failure will make deployment pipelines mandatory for pipeline success.
if both autorun and allow failure are disabled, deployment pipelines will need manual triggering but will be blocking the pipeline

Be aware: after configuration change, pipeline is not triggered automatically. Please run it manually (by clicking the run pipeline button in Pipelines tab) or push new code.

Edited Sep 09, 2025 by Thomas Morin

Reconfigure keycloak postgresql DB