Skip to content

Handle full S3ActionCache and action cache failures in scheduler

Jeremiah Bonney requested to merge jbonney/action-cache-error-handling into master

Before raising this MR, consider whether the following are required, and complete if so:

  • Unit tests
  • Metrics - Adds error handling, no new metrics needed
  • Documentation update(s) - Adds error handling, no new docs needed

If not required, please explain in brief why not.

Description

This MR works to tighten up the behavior of the ActionCache/Scheduler when things go wrong. For the S3ActionCache if QuotaExceeded is returned when trying to add something to the ActionCache that error is returned to callers as a RESOURCE_EXHAUSTED grpc error. In addition, the scheduler's interactions with a provided action cache are made more robust, by catching any exceptions and logging an error if they fail. This should allow executions to continue unhindered if the ActionCache fills up or is otherwise unavailable, but still let us know that something very abnormal is happening.

Edited by Jeremiah Bonney

Merge request reports