Web Hooks: Handle cold-starts of receivers

Description

Some web-hook receivers suffer from cold-start delays (especially when using systems such as AWS Lambda).

This can lead to the following cycle of events:

  • An event is triggered
  • A web-hook request is sent
  • The receiving endpoint is asleep, and must start up
  • The request times out
  • The hook is suspended (backed-off) after this failure
  • During the suspension, the receiver goes to sleep
  • (Start again at the top...)

There are several things we can do here:

  • shorter initial timeouts. We have quite an aggressive back-off (10min), which is probably long enough for some receiver containers to be reaped. Perhaps reduce this to 1min?
  • allow N failures before backing-off, to allow for cold-starts and avoid penalizing the first expensive request
  • longer timeouts (it would be nice to avoid this!)

Customer Impact

Edited by Grant Hickman