What does this MR do?
Add a retry mechanism for all docker commands that can return a 404 on sending requests to the Docker API. Use the retry package, so every time a 404 is returned from the API it is returned it is retried with some backoff logic.
Why was this MR needed?
Sometimes when the Docker daemon is underload or there is performance problems it might result in the Runner trying to start/inspect a container that hasn't been created yet, which results in a failure on the job.
Are there points in the code the reviewer needs to double check?
Discussion for PoC
Docker will return an
objecNotFound, every time a
404 is returned from the API. It doesn't seem like there is a way to make it automatically retry when
404 error is returned in
wrapError or anything since each function call is different. To make it reusable we need to wrap each call a
run function which will just return an error and then check that error type which is what we are doing in this PoC.
What you should be looking when evaluating the PoC
- Does it make sense, how we are using the retry mechanism?
- Is there a better way instead of having us create a struct to implement the retryable interface every time?
- Do you think it's clear that the user has to look at the struct fields to get the response from the field?
- Do we all wish Go has generics?
- Would it be easier to implement a backoff mechanism just for the Docker library?
- Does it make sense to have
context, passed to
Runinstead of having it as a struct field.
- Should we implement this for every method we have inside of the client or the ones that only that make sense?
What you should NOT be looking at when evaluating the PoC
- Code quality
- Any error checking
This is a simple PoC to get an idea of what we need to do. The
Does this MR meet the acceptance criteria?
Added tests for this feature/bug
In case of conflicts with
master- branch was rebased
What are the relevant issue numbers?
reference #4450 (closed)