Skip to content

feat: Implement prometheus counter for checkpointing errors

What does this merge request do and why?

This MR introduces a prometheus counter named duo_workflow_checkpoint_error_total for when POST requests to /api/v4/ai/duo_workflows/workflows/*/checkpoints return a response status of >= 400.

Link to Sentry events

Relates to #1009 (closed)

How to set up and validate locally

  1. Run DWS through GDK and apply this code change (which causes the /checkpoints response to always be 500)
git apply !2953.patch
diff --git a/duo_workflow_service/checkpointer/gitlab_workflow.py b/duo_workflow_service/checkpointer/gitlab_workflow.py
index 2c883369..a2610e10 100644
--- a/duo_workflow_service/checkpointer/gitlab_workflow.py
+++ b/duo_workflow_service/checkpointer/gitlab_workflow.py
@@ -489,6 +489,7 @@ class GitLabWorkflow(BaseCheckpointSaver[Any], AbstractAsyncContextManager[Any])
                     cls=CustomEncoder,
                 ),
             )
+            response = GitLabHttpResponse(status_code=500, body="", headers={})
             if isinstance(response, GitLabHttpResponse) and response.status_code >= 400:
                 duo_workflow_metrics.count_checkpoint_error(
                     endpoint=endpoint,
  1. Open your local prometheus server (by default it is on localhost:8083)
  2. Look for duo_workflow_checkpoint_error_total which should show the new counter

Merge request checklist

  • Tests added for new functionality. If not, please raise an issue to follow up.
  • Documentation added/updated, if needed.

Relates to #1009 (closed)

Edited by Tim Morriss

Merge request reports

Loading