Create monitoring for business critical errors
Problem
Today we don't have any automated monitoring set up when users experience business critical errors.
Below are two recent examples where the errors were first reported by users and relayed to the product/engineering teams via support:
- Expired trial users were unable to upgrade - issue was live in production for 3 months - issue
- Some .com users were unable to upgrade - issue was live for 24 hours - issue
Proposed solution
We should implement monitoring on areas we define as business critical so we can monitor the error rates. As a first version just having access to this monitoring would be great. If possible we should set up automated alerts when these errors occur or increase over X% compared to the mean. If these automated alerts add too much additional complexity we should break the automated alerts out into a follow-up issue.
Areas we should monitor for errors: (open to additional items this is only a first pass)
- Account creation
- Trial creation
- User clicks on "upgrade" CTAs
- User clicks on "trial" CTAs
- Billing page loads
- Each checkout page load
- Checkout payment processing
- Upgrade pages/checkout in portal
- Add seats pages/checkout in portal
- Manual renewal flow pages/checkout in portal
Edited by Sam Awezec