Sidekiq/ReactiveCachingWorker #NoMethodError
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
We're getting a couple of Sentry Errors (that are mostly just noise as the worker retries)
Specifically
https://new-sentry.gitlab.net/organizations/gitlab/issues/1948723/?project=3
Note: The links above might not show any recent occurances, this is because Sidekiq makes new "issues" sometimes which is misleading.
There's a conversation on how to fix these issues due to the "diff_head_pipeline" going missing. (eg, when there is a push right after)
@Max Fan briefly touching back on this. The problem is almost definitely that the diff head pipeline is nil when the reactive cache worker actually runs. The solution would be to either pass the pipeline id to the worker or to more gracefully handle this case. I'm leaning towards just not calculating the cache (as opposed to passing the pipeline id), because the cache would get invalidate right away next time anyway (i.e. nobody would see the value)
Max Fan 33 minutes ago
Yeah, it's a race condition. There's a check for the
diff head pipelinebefore the worker is enqueued. Do you know how a diff head pipeline is present, but by the time the worker runs it's missing?The exception raising is a good thing, the sidekiq retries when it encounters an exception that's why there's no customer issue right now. And succeeds on the next run, which makes me thing the diff head pipeline came back? (edited)
Hordur 24 minutes ago
If there's a new push in the meantime, the diff head pipeline becomes nil
Max Fan 18 minutes ago
okay yeah, so the time between the push + new pipeline creation. Because as soon as a new pipeline is created, that's the new diff_head_pipeline
I can make an issue about this, probably sev3
Hordur 18 minutes ago
it's probably because the head pipeline no longer matches the sha
this can, for example, occur if either the pipeline is created faster than the MR diff, or the other way around
I think the diff is always refreshed before the pipeline, so it's probably rather that the pipeline hasn't been refreshed yet
I think there are two paths forward:
- Add a special "optimistic retry" error that allows reactive caching to try later without polluting sentry
- Add a special "skip cache" error, that simply skips caching for now, no retries, no sentry tracking
an optimistic retry with a bit of backoff is probably the right thing in our case
The remaining question would be whether we then persist an empty result or not after all retries are exhausted. Not persisting would be functionally equivalent to what we do today (just less noise)
@Max Fan yes please create an issue
Max Fan 4 minutes ago
Or because this reports comparer only cares about the latest pipeline. We can just return
nilbecause whatever was the pipeline in question when this ran is no longer important.Once the new pipeline finishes, the UI will do the same request again and trigger another worker as the pipeline is differentThis would be functionally equivalent to what we have today. And should work, but I wanted to test it before any MRs