Graphql queries to readyToMergeEE endpoint exceeding 1s duration threshold
Summary
10% of GraphQL queries to endpoint readyToMergeEE
are taking longer than the 1s threshold for a Dedicated customer, causing Dedicated EOCs to be paged due to Apdex SLO violations, but their query_analysis.duration
values are quite fast.
Impact
- At least one Dedicated customer is experiencing slower responses to this Graphql query endpoint than the accepted 1s threshold.
- Dedicated EOCs are being paged without the ability to take action. (https://app.incident.io/gitlab/incidents/4326)
Evidence collected on one Dedicated tenant:
Roughly 10% of queries surpass the 1s threshold over the last 7 days (enough to consistently trigger alerts)
Though duration_s surpasses 1s, most queries have query_analysis.duration less than 0.01s. There are no other fields available in the logs to understand where the remaining time is taken for these queries.
Recommendation
Determine the cause of these slower queries, or adjust the threshold if queries are expected to take > 1s to complete.
Verification
Edited by 🤖 GitLab Bot 🤖