Skip to content

Remove handled errors from Sentry to reduce noise

Problem

We have a bunch of handled errors that show up in Sentry causing noise.

Example -

Desired Outcome

Lesser noise in sentry alerts channel

Move to using Prometheus alerts over Sentry events for specific errors.

These should include:

  • 500 errors from both Gitlab /checkpoints API and Anthropic API
  • Async RunTimeErrors with message "Task was destroyed but it is pending!"
  • Async CancelledError with message "async generator raised StopAsyncIteration"

Implementation Plan (Proposed)

Implement the following approach to reduce noise on the above errors:

  1. Create prometheus counter for error
  2. Create alert in Grafana
  3. Filter alert from Sentry when alert is in place

Current progress

  • Create an alert from grafana dashboard in the same #g_duo_workflow_alerts to understand if they cross a certain threshold as these errors hamper the workflow execution.
  • Document in troubleshooting docs.

ON HOLD / DROPPED

  • ON HOLD: currently removing 500 /checkpoints status errors from Sentry is currently on-hold until we understand what is causing the 500 status errors. /checkpoint API endpoint for gitlab.com sometimes returns 500 error which causes a JSON Decode error. Addressed in !2868 (closed)
  • 500 status error from Anthropic. Currently I can only see one instance of this in the sentry issue. I propose we create an alert to deal with this instead.
  • ON HOLD: As described in issue #1314 on hold until we decide if it's something we want to filter or fix. Create an alert threshold for RunTimeError with message "Task was destroyed but it is pending!" returned during model completions:
  • Create an alert threshold for Async CancelledError with message "async generator raised StopAsyncIteration" Issue will be fixed rather than filtered / alerted #961 (closed)
Edited by Tim Morriss