Spike: Assessment of integrations architecture
Problem
Given our recent discussion of the direction of the Customers Portal, @djparker wisely brought up that it would be an opportune time to take a step back and assess our current architecture with respect to third-party integrations (Zuora, SFDC, GL.com, etc). While we already have an epic dedicated to data integrity problems originating from the integrations, the purpose of this issue is more proactive, rather than reactive.
Proposal
Conduct an honest assessment of those third-party integrations. Determine how they're architected within the grand scheme to help us decide what activities are needed to move it to a more stable and maintainable state. This work should be done in collaboration with the Enterprise Applications Integrations Engineering team as the DRI for those integrations going forward. This work will be a combination of short-term and medium-term projects, some of which are already in-flight (as part of the data integrity epic) but are important to the future of the customer portal.
Highlights:
- We have little to no visibility into what's failing when we see callouts from Zuora error.
- We can't easily trace requests between systems to see what the resulting account mutations were on either side of the integration.
- We have no automated alerting when the integration goes down or starts to fail.
- We should be intentionally designing our different subscription flows such that we can start to reduce the overall manual support and billing tasks.
- We need an incident response team to handle outages and data issues in this space, as is evident by recent outages that required engineering effort to resolve.
- In the future as we design a better B2B and B2C experience, we'll need to architect/rearchitect the integrations to make sure that the customer portal functions correctly and provides a seamless customer billing experience.
Note: This work could be impacted by decisions made in other initiatives around the Customer Portal. For example, if we choose to merge the Customers Portal into GL, we would do that in an iterative matter and would be an ideal time to fix these architecture problems as we build out new pieces of the Portal in GL. More information can be found in the issue for the discussion of these bigger initiatives.
Result
A clearer understanding of the current integrations architecture, where they are lacking, and a plan for a more stable and maintainable state.