Design UX for handling partially failed operations in Secrets Manager

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Close this issue

Problem statement

When creating or updating GitLab CI/CD secrets via the Secrets Manager, multiple API calls to OpenBao are required to complete the operation. These calls happen sequentially, and if any call fails, the subsequent operations are not attempted.

Each API call has a specific purpose and impact on the secret's functionality:

Value API call (POST kv/data/{path}):
- Creates/updates the actual secret value
- If this fails, the entire operation fails
Metadata API call (POST kv/metadata/{path}):
- Updates metadata like description, environment, and branch
- If this fails, the value will be updated but with no metadata (environment, branch, description)
- The secret would exist but be completely unusable in pipelines
- It may show up in the list of secrets with just its name
- All subsequent operations (policy and JWT) will also be skipped
Policy API calls (POST sys/policies/acl/{policy-name}):
- Creates or updates the policies that grant access to the secret based on environment/branch
- If this fails, the value and metadata will be updated, but pipelines won't have access
- For updates involving environment/branch changes:
  - We first remove the secret from its old policy
  - Then add it to the new policy
  - If either fails, the access controls will be in an inconsistent state
JWT Role API call (POST auth/{mount}/role/{role-name}):
- Updates the JWT role with glob policies for wildcard patterns (e.g., staging-*, feature/*)
- This is the final step, so if previous steps fail, this won't be attempted
- If this fails, pipelines matching wildcard patterns won't work, but exact matches might still work

Both the create and update operations follow this same sequence of API calls, so they face the same potential failure scenarios.

Currently, there's no defined UX strategy for communicating these partial failures to users, which can lead to confusion when a secret appears to be created or updated but doesn't work as expected in pipelines.

Goal

Design a clear, user-friendly approach to handle and communicate partial failures in secret operations. Users should understand:

Which aspects of their secret are functional and which aren't
How the specific failure impacts their CI/CD pipelines
What actions they can take to resolve the issues

Proposal

Create a UX strategy for handling partial failures that:

Translates technical failures into user-meaningful outcomes
Provides clear status indicators for each critical aspect of a secret's functionality
Offers actionable guidance specific to the type of failure
Maintains consistency between create and update operations

User experience

Consider the following technical failure scenarios and their user impact:

Scenario 1: Value API succeeds, Metadata API fails

Technical Impact: Secret value exists but with no environment, branch, or description metadata
User Impact: Secret appears in the list but is completely unusable in pipelines
User Need: Understand that the secret was only partially created and needs proper configuration

Scenario 2: Value & Metadata APIs succeed, Policy API fails

Technical Impact: Secret and metadata exist, but access policy doesn't match environment/branch
User Impact: Pipelines can't access the secret despite UI showing correct configuration
User Need: Understand why pipelines can't access the secret and how to fix it

Scenario 3: Value, Metadata & Policy APIs succeed, JWT Role API fails

Technical Impact: Secret works for exact environment/branch matches but not wildcard patterns
User Impact: Inconsistent behavior - some pipelines can access the secret, others can't
User Need: Clear indication that wildcard functionality is impaired

Implementation details

This issue requires collaboration between:

Frontend developers
Backend developers
UX designers

We need to:

Define user-friendly error messages mapped to specific API failures
Design status indicators that clearly communicate the secret's functional state
Implement retry mechanisms for specific failed components when possible
Create consistent patterns for handling failures across create/update operations
Design a UI that clearly shows functional state without exposing implementation details