Web Hooks: Handle cold-starts of receivers
Description
Some web-hook receivers suffer from cold-start delays (especially when using systems such as AWS Lambda).
This can lead to the following cycle of events:
- An event is triggered
- A web-hook request is sent
- The receiving endpoint is asleep, and must start up
- The request times out
- The hook is suspended (backed-off) after this failure
- During the suspension, the receiver goes to sleep
- (Start again at the top...)
There are several things we can do here:
- shorter initial timeouts. We have quite an aggressive back-off (10min), which is probably long enough for some receiver containers to be reaped. Perhaps reduce this to 1min?
- allow N failures before backing-off, to allow for cold-starts and avoid penalizing the first expensive request
- longer timeouts (it would be nice to avoid this!)
Customer Impact
- We've had one report so far, impacting 1100 projects (Slack thread, internal, 90 days)
Edited by Grant Hickman