Error Handling
Following up on our Office hours today. Not sure if we want this to be Target only or not your call @aaronsteers
Error Handling especially with SaaS style targets gets pretty interesting. Here's errors you'll hit at some point (one's that I can think about off the top of my head there's tons more, everything you can imagine when you run this stuff at scale)
Connection issues
- For HTTP requests: 500 Requests, timeouts in everyway you can imagine (hopefully your libraries have sane defaults for connection timeouts, read timeouts, targets will need to change these at timmes) "Server Busy", "Internal Error", etc
- Data Issues for HTTP you'll get response codes all over the place depending on the api but generally something like 406, 403, 404, 400, etc. "User already exists", "Name is invalid (over char limit)", "Unknown Error occured", "Cannot disable user due to them having xyz permissions"
Each of these errors needs to be handled slightly different. Some a simple retry with exponential backoff fixes your problem.
Data issues are something you can't get away from, and for a lot of SaaS apis (lots are not http based by the way, see Active Directory, and more) you'll get data errors that are masked as things like 500 errors.
Functionality that's probably needed:
- Error handling strategy for "hard" or "soft" errors. One record failing out of 1000 should still output something to stderr / stdout , and the target process should return a response code of something different than 0, but it's no where near as critical as all 1000 records failing which would need a response code of 1.
- Configuration for changing thresholds by users of targets. Everyone has different use cases. Thresholds could be percentage based, hard coded number of rows like >10 rows is a "hard" failure
- Retry logic
Some of this "maybe all?" could be handling by a dead letter queue of some sort.
Use cases that I know about today: