Skip to content

Improve systemic errors detection by looking at the first backtrace line

Rémy Coutable requested to merge improve-systemic-error-detection into main

What does this MR do and why?

We now look at the first line of the error backtrace to group errors, and better detect systemic errors (i.e. errors that happen systemically after a certain point due to, most probably runner environment issue, e.g. resources are exhausted, PG doesn't have enough memory etc.).

Note: There's a risk that "legit" failure messages that are a bit generic, e.g. Failure/Error: expect(job).to be_successful(timeout: 400) might be detected as systemic with this change. I'll open an MR to allow to customize the systemic error detection threshold.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Rémy Coutable

Merge request reports