Resolve `Email has already been taken` errors when creating Customer from Sold To Contact
Problem
As described in this thread (the second error specifically), we've seen the following error from time to time:
ZuoraCallout::SyncResource::ContactWorker: Creation of customer from Sold To Contact failed
This error was occurring more commonly due of a race condition in the callouts for contact update and order processed. They were frequently handled at the same time. Since 2024-04-16, we've added offsets for processing these callouts such that orders processed callouts are handled immediately, account update callouts are enqueued with a 30 second delay and contact update callouts are enqueued with a 1 min delay. This was introduced in https://gitlab.com/gitlab-org/customers-gitlab-com/-/merge_requests/9284. This delay has helped in that we aren't seeing nearly as many errors, but it's not a guarantee.
In this case, it was bad timing because the Contact Update callout was triggered at 04/22/2024 08:48:03 PDT
for this Zuora Account, but the Order Processed callout was triggered at 04/22/2024 08:49:01 PDT
for this Zuora Account (the subscription's invoice owner). As you can see, the Order Processed callout was triggered about a minute after the Contact Update callout which essentially negates the offset in enqueuing.
For this type of error, I'm not sure increasing the delay in processing is a great option. It will never be a guarantee and increasing the delay may lead to a less optimal experience.
Proposal
We could explore a couple of options for avoiding this error:
- Rescue the error
Email has already been taken
when the Customer fails to save from the Sold To Customer in theAccountWorker
orContactWorker
. This error means we already the Customer in the DB. Instead of logging an error, we could info log it and not raise an error (avoid the Sentry issue). - Consider retrying the
ContactWorker
(or theAccountWorker
) at least once when errors occur. The work done in these workers should be idempotent anyway. If we do this, we should only log an error after the final attempt was unsuccessful. We could log warnings for the other attempts.