2025-09-13: rails_request error rate in ai-assisted service exceeds SLO in main stage
rails_request error rate in ai-assisted service exceeds SLO in main stage (Severity 3 (Medium))
Problem: The error rate for requests in the ai-assisted service (main stage) exceeded its SLO target, with error rates spiking above thresholds between September 13 and 15.
Impact: No customer impact has been observed. The affected endpoint handles a very low number of requests and errors occurred periodically, likely due to a scheduled job.
Causes: A misconfigured parameter in the 'POST /api/:version/ai/third_party_agents/direct_access' endpoint led to errors with the message 'missing keyword: :unit_primitive_name'.
Response strategy: We identified the root cause as a missing parameter in the endpoint and implemented a fix in the 'Fix ThirdPartyAgents::TokenService and improve specs' merge request, which has been merged. We have observed the error rate decrease since.
This ticket was created to track INC-3921, by incident.io