OpenBao self-init fails when audit stream endpoint is unavailable during helm upgrade
Problem
When enabling OpenBao via helm upgrade, the self-initialization process fails to complete, resulting in "permission denied" errors when attempting to provision Secrets Manager for a project.
Root Cause (What We Know)
The issue occurs due to a timing problem during helm upgrade:
- When
global.openbao.enabled: trueis added, it changes the Rails configuration - This triggers a rolling update of Webservice pods
- OpenBao pods come online before the new Webservice pods are ready
- Old Webservice pods (with
global.openbao.enabled=false) reject audit stream requests with HTTP 500 - OpenBao's self-init process fails when the audit backend test fails
- The credential backend is never enabled, leaving OpenBao in an incomplete state
- Subsequent authentication attempts fail with "permission denied"
Evidence:
- OpenBao logs show repeated
new audit backend failed testerrors withstatus code not 2xx: 500 - No
credential backend enabledmessage appears in logs - Database has only 21 records instead of 31 (incomplete self-init)
- See: #582828 (comment 2935176465)
What We Don't Know
Why doesn't OpenBao recover from audit backend test failures?
- OpenBao retries the audit backend test but never completes self-init
- The credential backend should eventually be enabled once the audit stream endpoint becomes available
Workarounds
Option 1: Disable Audit Streaming (Temporary)
Set openbao.config.audit.http.enabled: false in values. This works but loses audit streaming functionality.
Option 2: Scale Down/Clean/Scale Up
kubectl scale deploy gitlab-openbao --replicas=0DELETE FROM openbao_kv_store;kubectl scale deploy gitlab-openbao --replicas=2
Option 3: Enable, then install
- Set
global.openbao.enabled=trueand upgrade then chart. - Set
global.openbao.enabled=trueandopenbao.install=trueand upgrade the chart again.
This ensure that webservice pods can accept connection to the audit stream endpoint when openbao pods go online.
See #582828 (comment 2937146435)
Option 4: Enable HTTP audit device after install
Based on note #2941384091, here's the workaround you can paste in the issue description:
Disable HTTP audit streaming during initial OpenBao deployment, then enable it afterwards:
-
Initial deployment: Set
openbao.config.audit.http.enabled: falsein your values:global: openbao: enabled: true openbao: install: true config: audit: http: enabled: false -
After OpenBao is healthy: Re-run
helm upgradewith HTTP audit enabled:global: openbao: enabled: true openbao: install: true config: audit: http: enabled: true
This allows OpenBao to complete self-initialization successfully without the audit stream endpoint being unavailable during the Webservice rolling update.
Proposed Solutions
Short-term
Improve the readiness probe to detect incomplete self-init (credential backend not enabled). However, restarting won't fix it - the database needs to be cleaned.
Long-term
Fix OpenBao to properly recover from transient audit backend failures and complete self-init once the endpoint becomes available.
Environment
- GitLab Chart: v9.6.1 and master
- OpenBao Chart: v0.8.1 and v0.9.0
-
Deployment method:
helm upgradewith OpenBao enabled