Duo Workflow Service crashes on startup when 'DUO_WORKFLOW_AUTH__OIDC_CUSTOMER_PORTAL_URL' is empty or CustomersDot is unreachable
### Summary The Duo Workflow Service (DWS) crashes on startup when it cannot prefetch JWKS keys from all configured OIDC providers. This is a functional blocker for offline-license customers with the DAP entitlement, because the [documentation](https://docs.gitlab.com/18.11/install/install_ai_gateway/#start-a-container-from-the-image) instructs them to set `DUO_WORKFLOW_AUTH__OIDC_CUSTOMER_PORTAL_URL=` (empty string), which triggers a `MissingSchema` error in the Cloud Connector library and causes the DWS to fail to start. The main AI Gateway handles this scenario gracefully by logging the error and continuing with an incomplete JWKS cache. The DWS does not: it treats `cloud_connector_ready() == False` as a fatal error and raises `AuthenticationError`, preventing the service from starting. ### Steps to reproduce 1. Deploy a self-hosted AI Gateway for an offline-license GitLab instance with the DAP entitlement. 2. Follow the documented configuration and set `DUO_WORKFLOW_AUTH__OIDC_CUSTOMER_PORTAL_URL=` (empty string) on the AI Gateway container. 3. Start the AI Gateway container. 4. Observe the DWS component crash with the following errors: ``` [error] Invalid URL '/.well-known/openid-configuration': No scheme supplied. Perhaps you meant https:///.well-known/openid-configuration? [critical] Could not prefetch keys [critical] Failed to initialize OIDC auth provider: Could not prefetch keys ``` ``` Traceback (most recent call last): File "/home/aigateway/app/duo_workflow_service/interceptors/authentication_interceptor.py", line 132, in _init_oidc_auth_provider raise AuthenticationError(error_msg) duo_workflow_service.interceptors.authentication_interceptor.AuthenticationError: Could not prefetch keys ``` This also reproduces when `DUO_WORKFLOW_AUTH__OIDC_CUSTOMER_PORTAL_URL` is left unset (defaulting to `https://customers.gitlab.com`) but CustomersDot is unreachable from the container, for example due to network restrictions or a transient outage. In that case the error is `HTTP 502 response from well_known` instead of `MissingSchema`, but the outcome is the same: `cloud_connector_ready()` returns `False` and the DWS crashes. ### Root cause In [`duo_workflow_service/interceptors/authentication_interceptor.py`](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/blob/main/duo_workflow_service/interceptors/authentication_interceptor.py#L128-132), the DWS treats an incomplete JWKS cache as a hard startup failure: ``` if not cloud_connector_ready(provider): error_msg = "Could not prefetch keys" logger.fatal(error_msg) raise AuthenticationError(error_msg) ``` The main AI Gateway does not have this hard-fail behavior. It logs errors from failed OIDC provider fetches but continues operating with whatever keys it was able to retrieve. ### Current behavior The DWS crashes on startup if any configured OIDC provider is unreachable or returns an error during JWKS prefetch. ### Expected behavior The DWS should tolerate an incomplete JWKS cache when one or more OIDC providers are unavailable, consistent with how the main AI Gateway handles this scenario. At minimum, the DWS should be able to start and authenticate requests using the keys it was able to fetch (e.g., from the local GitLab instance). ### Workaround Set both `AIGW_CUSTOMER_PORTAL_URL` and `DUO_WORKFLOW_AUTH__OIDC_CUSTOMER_PORTAL_URL` to the local GitLab instance URL (e.g., `https://<GitLab Instance FQDN>`). This gives the CustomersDot OIDC provider a reachable endpoint that returns valid JWKS keys, allowing `cloud_connector_ready()` to succeed and the DWS to start. ### Related issues * #517089 — Gracefully handle missing/empty CDot OIDC provider URL in the Cloud Connector library (short-term fix, open) * #517088 — Make OIDC provider list configurable rather than hardcoded (long-term fix, open) * #520808 — Helm chart sets `AIGW_CUSTOMER_PORTAL_URL: ""` when not explicitly configured (closed, 17.10) * #517083 — Audit of Cloud Connector behavior when `AIGW_CUSTOMER_PORTAL_URL` is not set (closed)
issue