Skip to content

Product Planning Error Budget Investigation

Summary

Product Planning error budgets remain in the red. This issue is designed to investigate the primary contributors to budget spend. The current 7d budget is 99.90%. The target is 99.95%.

Contributing Factors

The top 5 contributing endpoints for Apdex-related issues are:

"json.meta.caller_id.keyword: Descending" json.request_urgency.keyword: Descending json.target_duration_s: Descending Count Operations over specified threshold (apdex)
GraphqlController#execute low 5 355190 6137
GET /api/:version/groups/:id/epics low 5 76559 2397
Groups::EpicsController#show default 1 142447 2256
Groups::EpicsController#index default 1 27199 1086
GET /api/:version/groups/:id/epics/:eventable_id/resource_label_events low 5 5855 508

Investigation

Create a discussion for each endpoint(s) being investigated and add findings and next steps to this section.


GraphqlController#execute


GET /api/:version/groups/:id/epics


Groups::EpicsController#show

Findings

Proposal

  • Create a GraphQL endpoint for retrieving group-level counts and update frontend to remove counts from the page-load.
  • [Quick] Increase cache retention timings.

Groups::EpicsController#index

Findings

  • Mostly likely same root cause as Groups::EpicsController#show. Cache misses for group-level sidebar counts.
  • Incidence of slowness is higher than other group-level, so there may be other factors (see #367868 (comment 1026555132)).

Proposal

  • Create a GraphQL endpoint for retrieving group-level counts and update frontend to remove counts from the page-load.
  • [Quick] Increase cache retention timings.

GET /api/:version/groups/:id/epics/:eventable_id/resource_label_events

Edited by John Hope