Duo Agent Platform: Workflows fail with 401 errors

Overview

When a workflow is started, for example, via Web Agentic Chat, an Oauth token is generated. Then this token is used for performing HTTP requests sent by Duo Workflow Service.

However, some of such requests fail with 401 error as if the token is invalid or doesn't exist: https://log.gprd.gitlab.net/app/r/s/oOynL

handleAgentMessages: failed to read a gRPC message: rpc error:
code = Internal desc = workflow execution failure: Exception: GraphQL errors: [{'message': 'Invalid token'}]

Potential issue

The gap between the request where the token is created and the requests where the token is used is quite small: https://log.gprd.gitlab.net/app/r/s/QSmBc and the failed request reads all the data from the replica. So I can imagine that the issue may happen due to replication lag:

  • A token is created and is being replicated to the replicas
  • The token is sent to Workhorse to perform API requests to Rails
  • DWS starts performing the HTTP requests via Workhorse and suddenly uses a replica that is not caught up
  • 401 happens

It's reproducible locally using the setup steps from this issue: Snippet creation not resilient to Postgres repl... (#413908 - closed)

Edited by 🤖 GitLab Bot 🤖