Refactor secrets manager deprovisioning to cron-based worker and prevent cascading deletes
Summary
Follow-up to !215116 (merged) and !215844 (closed) to improve reliability and resilience of secrets manager deprovisioning, especially in cases where the deprovision worker is abruptly killed and deprovision tasks become stale.
Problem to solve
Currently, deprovisioning is tied to a worker that can be interrupted (e.g., worker crash, deploy, scaling event). If this happens mid-deprovision, we risk:
- Stale deprovision tasks that are never retried
- Inconsistent state between GitLab DB and OpenBao
- Loss of secrets_manager records due to cascading deletes when projects/groups are removed, making it impossible to correctly clean up external secrets later
Proposal
Implement a cron-based deprovisioning mechanism and adjust data model so that secrets_manager entries remain available for reliable cleanup.
1. Prevent cascading deletion of secrets_manager records
- Remove cascade dependency so that secrets_manager rows (for both projects and groups) are not deleted at the DB level when a project/group is removed.
- Ensure secrets_manager persists long enough to:
- Track deprovisioning state
- Store references to external secrets required for cleanup
2. Store stable references for cleanup
- Add new columns to both project and group secrets manager tables to store:
-
namespace_path(for the owning namespace at the time deprovisioning is initiated) - The actual OpenBao secret path on the secrets_manager (if not already persisted in a reliable way)
-
- These fields will:
- Provide a stable reference for secret cleanup
- Simplify secret-related operations during deprovision
3. Replace ad-hoc worker with a cron-based deprovision worker
- Remove the
initiate_deprovision_by_pathworker. - Keep an
InitiateDeprovisionServicethat callsinitiate_deprovisionand transitions the secrets_manager entry to adeprovisioningstate. - Refactor the existing deprovision worker into a cron job worker that:
- Periodically queries the
project_secrets_manager(and group equivalent) tables for rows withstate == deprovisioning. - Fetches a single row at a time (
LIMIT 1) to process sequentially, avoiding the need for additional locking. - Performs the full deprovision workflow:
- Removes secrets from OpenBao using the stored secret path/namespace_path.
- Cleans up or updates the secrets_manager record as appropriate (e.g., marking as
deprovisionedor removing it once safe).
- Periodically queries the
- Ensure the cron worker will:
- Pick up stale or previously failed deprovisioning attempts automatically.
- Handle deprovisioning triggered by both:
- Project transfer / deletion flows
- Toggling the secrets manager project setting
4. User feedback and observability
- Consider how we surface the presence of stale or orphaned secrets to users or admins, e.g.:
- A status indicator on the secrets manager settings when deprovisioning is pending or has failed.
- Metrics/logging or alerts for high numbers of
deprovisioningorfailedstates.
Goals / Non-goals
Goals
- Make deprovisioning resilient to worker interruptions.
- Ensure OpenBao secrets and GitLab secrets_manager records remain consistent.
- Avoid blocking users during secret deletion or project transfer.
Non-goals
- Changing the external OpenBao API/contract (beyond what is needed for more robust cleanup).
- UI redesign of the secrets manager configuration page (beyond minimal status/feedback additions, if any).
Milestone / Timing
As discussed in !215116 (merged), this can be tackled pre-beta:
- Preventing cascading delete of secrets manager records (project and group).
- Adding
namespace_path(and any other required reference fields) to secrets manager tables.
This can be implemented in two iterations:
First MR
- Remove cascading deletion for project and group secrets managers.
- Set
group_pathandproject_pathat the model level. - Remove the
ByPathworker and simplify the deprovision service to rely only ongroup_pathandproject_path.
Second MR
Refactor the deprovision worker to run as a cron-based worker.
Note
Ideally, we would also backfill existing secrets_managers records with their corresponding group/project paths. Since this is still experimental, we can likely skip this for now and instead handle legacy records by deriving and remapping the group/project paths when they are missing in the database.
References
- MR: !215116 (merged) - Deprovision secrets management on project delete or move
- MR: !215844 (closed) - Follow-up refactor for deprovisioning approach (cron/DB changes)
Labels
~"type::feature"~"devops::configure"- Consider:
~"Seeking community contributions"once scoped