feat: Implement prometheus counter for checkpointing errors
What does this merge request do and why?
This MR introduces a prometheus counter named duo_workflow_checkpoint_error_total for when POST requests to /api/v4/ai/duo_workflows/workflows/*/checkpoints return a response status of >= 400.
Relates to #1009 (closed)
How to set up and validate locally
- Run DWS through GDK and apply this code change (which causes the
/checkpointsresponse to always be500)
git apply !2953.patch
diff --git a/duo_workflow_service/checkpointer/gitlab_workflow.py b/duo_workflow_service/checkpointer/gitlab_workflow.py
index 2c883369..a2610e10 100644
--- a/duo_workflow_service/checkpointer/gitlab_workflow.py
+++ b/duo_workflow_service/checkpointer/gitlab_workflow.py
@@ -489,6 +489,7 @@ class GitLabWorkflow(BaseCheckpointSaver[Any], AbstractAsyncContextManager[Any])
cls=CustomEncoder,
),
)
+ response = GitLabHttpResponse(status_code=500, body="", headers={})
if isinstance(response, GitLabHttpResponse) and response.status_code >= 400:
duo_workflow_metrics.count_checkpoint_error(
endpoint=endpoint,
- Open your local prometheus server (by default it is on
localhost:8083) - Look for
duo_workflow_checkpoint_error_totalwhich should show the new counter
Merge request checklist
-
Tests added for new functionality. If not, please raise an issue to follow up. -
Documentation added/updated, if needed.
Relates to #1009 (closed)
Edited by Tim Morriss