chore: set up alerts for RepoXray worker errors that exceed threshold
This MR addresses gitlab-org/gitlab#517181 (closed)
When the Sidekiq
worker Ai::RepositoryXray::ScanDependenciesWorker
executes the config file parser in the Repository Xray, the extract_libs
function encounters unhandled exceptions when processing certain edge cases in file contents. These exceptions occur when the parsing logic encounters unexpected data types or values, which then propagate through the system as Sidekiq
job errors.
While our Grafana dashboard) shows these errors represent a small percentage of overall job executions, addressing them is crucial for:
- Code completeness and maintaining robust code that handles all edge cases
- Preserving our error budget for the Code Creation team
This MR addresses this by setting up alerts for Ai::RepositoryXray::ScanDependenciesWorker
so that when the errors exceed a threshold of 0.1%
, Code Creation is alerted via Slack channel #g_mlops-alerts
.
Draft follow up/note to self: When these notifications are triggered, we want to consider automatically opening an actionable issue similar to gitlab-org/gitlab#517173 (closed) (with priority labels) so that our team can take action on it as soon as possible.