Gather Observability Requirements and Create Dashboard for SeatLink controller
Proposal
As follow up for Immediately fail seat link requests when Zuora ... (#7501 - closed), we want to collect observability metrics requirement and create a dashboard for SeatLink controller based on the metrics. Also, see 1, 2, 3.
The objective of this issue is to answer following with regards to Observability and Availability of SeatLink:
- What do we need to measure for observability of SeatLink performance?
- Does the current prometheus dashboard be enough or do we need more?
List of possible metrics to measure:
- Response Time (duration)
- 5xx Error rate
- Zuora Appdex board
- ... (TBD)
We need to decide on any additional observability requirement metrics
Result
Studying the metrics over some period of time will help us to determine appropriate action to increase the reliability of SeatLink sync. For an instance, if scaling horizontally with a dedicated VM is a correct solution approach.
Edited by Bishwa Hang Rai