Skip to content

Make it possible to handle both Mailgun's temporary and permanent failure hooks.

Problem

This is an infradev item because we had a week-long incident where e-mail notifications were delayed. We think the large number of Mailgun failures (both temporary and permanent) might have quietly induced a large build-up of retries on Mailgun's side. Usually Mailgun will add a permanent suppression to avoid affecting our sending reputation, but in this case we saw large number of temporary failures that should have caused us to stop sending notifications

Proposal

Engineering breakdown

Things we'll need to do here:

  1. Redirect / alias /-/members/mailgun/permanent_failures to a more generic /-/mailgun/permanent_failures
  2. Add /-/mailgun/temporary_failures or we could just make this one endpoint to handle both if payload has data to identify if it's a temporary or permanent failure
  3. For these new endpoints, detect "invite email" failures and call Members::Mailgun::ProcessWebhookService on those. For the other failures, we will handle them in the next issue
Edited by Gabe Weaver