Duo Workflow Service crashes on startup when 'DUO_WORKFLOW_AUTH__OIDC_CUSTOMER_PORTAL_URL' is empty or CustomersDot is unreachable

Summary

The Duo Workflow Service (DWS) crashes on startup when it cannot prefetch JWKS keys from all configured OIDC providers. This is a functional blocker for offline-license customers with the DAP entitlement, because the documentation instructs them to set DUO_WORKFLOW_AUTH__OIDC_CUSTOMER_PORTAL_URL= (empty string), which triggers a MissingSchema error in the Cloud Connector library and causes the DWS to fail to start.

The main AI Gateway handles this scenario gracefully by logging the error and continuing with an incomplete JWKS cache. The DWS does not: it treats cloud_connector_ready() == False as a fatal error and raises AuthenticationError, preventing the service from starting.

Steps to reproduce

  1. Deploy a self-hosted AI Gateway for an offline-license GitLab instance with the DAP entitlement.
  2. Follow the documented configuration and set DUO_WORKFLOW_AUTH__OIDC_CUSTOMER_PORTAL_URL= (empty string) on the AI Gateway container.
  3. Start the AI Gateway container.
  4. Observe the DWS component crash with the following errors:
[error] Invalid URL '/.well-known/openid-configuration': No scheme supplied. Perhaps you meant https:///.well-known/openid-configuration?
[critical] Could not prefetch keys
[critical] Failed to initialize OIDC auth provider: Could not prefetch keys
Traceback (most recent call last):
  File "/home/aigateway/app/duo_workflow_service/interceptors/authentication_interceptor.py", line 132, in _init_oidc_auth_provider
    raise AuthenticationError(error_msg)
duo_workflow_service.interceptors.authentication_interceptor.AuthenticationError: Could not prefetch keys

This also reproduces when DUO_WORKFLOW_AUTH__OIDC_CUSTOMER_PORTAL_URL is left unset (defaulting to https://customers.gitlab.com) but CustomersDot is unreachable from the container, for example due to network restrictions or a transient outage. In that case the error is HTTP 502 response from well_known instead of MissingSchema, but the outcome is the same: cloud_connector_ready() returns False and the DWS crashes.

Root cause

In duo_workflow_service/interceptors/authentication_interceptor.py, the DWS treats an incomplete JWKS cache as a hard startup failure:

if not cloud_connector_ready(provider):
    error_msg = "Could not prefetch keys"
    logger.fatal(error_msg)
    raise AuthenticationError(error_msg)

The main AI Gateway does not have this hard-fail behavior. It logs errors from failed OIDC provider fetches but continues operating with whatever keys it was able to retrieve.

Current behavior

The DWS crashes on startup if any configured OIDC provider is unreachable or returns an error during JWKS prefetch.

Expected behavior

The DWS should tolerate an incomplete JWKS cache when one or more OIDC providers are unavailable, consistent with how the main AI Gateway handles this scenario. At minimum, the DWS should be able to start and authenticate requests using the keys it was able to fetch (e.g., from the local GitLab instance).

Workaround

Set both AIGW_CUSTOMER_PORTAL_URL and DUO_WORKFLOW_AUTH__OIDC_CUSTOMER_PORTAL_URL to the local GitLab instance URL (e.g., https://<GitLab Instance FQDN>). This gives the CustomersDot OIDC provider a reachable endpoint that returns valid JWKS keys, allowing cloud_connector_ready() to succeed and the DWS to start.

  • #517089 — Gracefully handle missing/empty CDot OIDC provider URL in the Cloud Connector library (short-term fix, open)
  • #517088 — Make OIDC provider list configurable rather than hardcoded (long-term fix, open)
  • #520808 (closed) — Helm chart sets AIGW_CUSTOMER_PORTAL_URL: "" when not explicitly configured (closed, 17.10)
  • #517083 (closed) — Audit of Cloud Connector behavior when AIGW_CUSTOMER_PORTAL_URL is not set (closed)