CI: Track pipeline failures based on "old" and "new" errors - e.g. unit tests, lint violations and more

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

  • Close this issue

Description

A very very long time ago I used to use Atlassian Bamboo for CI. Bamboo is hard to use, slow, and self-hosted, hard to integrate with, and so on. However, there's one great feature in Bamboo that I miss in modern CIs like GitLab and Travis. It's understanding unit test results. By understanding I mean tracking every single unit test. For example, Dragos broke one test named "IpPool should allow only range definition for bridge". GitLab CI should note this and track "IpPool should allow only range definition for bridge" test, and say that this particular unit test was broken in pipeline 123. Then, just a minute later, Wojtek pushes something else, and the build fails too. But it's not Wojtek who broke it. Codeship would say build failed, and would note it's still failing since build 123 by Dragos because of unit test "IpPool ...". This way developers who push changes on broken build can make sure that they haven't broken anything, and it's only someone else who broke with a commit. At the same time, if Wojtek's build breaks some extra test, for example "ABC", then it would "assign" that particular failure to Wojtek.

A natural component of tracking unit tests on an individual level is customizable notifications. That is, notify those who caused the initial failure - and not subsequent committers who happened to push their code on top of someone's failing code.

Some example screenshots from Google Images that show the concept of "new" and "existing" errors:

image image image

Proposal

GitLab CI should allow developers to define an output artifact YAML file to be interpreted by GitLab webapp. It would be a list of unit test results (success/failure/ignored) with extra freeform metadata. GitLab webapp would analyze each output YAML and store its history to differentiate between "new errors" (errors introduced in this build) and "old errors" (errors introduced in a previous build).

test_results:
- name: test name
  result: success
- name: another test
  result: failure
  metadata: # Freeform metadata to be displayed in GitLab UI but no interpretation would occur
    message: 'NameError: uninitialized constant ClassXd'
    backtrace: "backtrace here, or whatever one supplies...\nfrom (irb):1\nfrom /Users/nowaker/.rvm/rubies/ruby-2.4.3/bin/irb:11:in `<main>'"
- name: one more test
  result: failure
  metadata:
    message: '"blah" expected to equal "bleh"'
    backtrace: ...
    rspec_seed: 123
    rspec_command: 'rspec spec/bleh.rb:123'

The example use case is centered around unit tests interpretation because this is something easily trackable. But I don't think we should limit it to unit tests only. For example, rather than calling this feature "Test Failure Tracking", we could name it "Job Failure Causes/Errors". The artifact YAML would contain a list of all cause names - usually unit test names but not necessarily. It could be anything. For example:

causes:
- name: 'Spec: it applies Texas sales tax when billing address is in Texas'
  metadata: # Freeform metadata to be displayed in GitLab UI but no interpretation would occur
    message: 'NameError: uninitialized constant ClassXd'
    backtrace: "backtrace here, or whatever one supplies...\nfrom (irb):1\nfrom /Users/nowaker/.rvm/rubies/ruby-2.4.3/bin/irb:11:in `<main>'"
- name: 'Docker: failure in Dockerfile on step 15: RUN MAKEOPTS="-j5" carton install --deployment'
  metadata:
    last_okay_image: abcd4321
    step: 15    
    output: whatever

I think this would be a GitLab EE feature. This is similar in concept to Browser Performance Testing where performance.json artifact is expected and GitLab UI is in charge of presenting the results.

This is kind of related to https://gitlab.com/gitlab-org/gitlab-ce/issues/28106 but the goal is totally different. Tracking unit tests isn't metrics-oriented. It's an MR-oriented thing.Ccode reviewers can immediately see whether a failed build was caused by the new code, or it "inherited" the failure from someone else.

I'd be glad for a review, @markpundsack. We (EE customers: DreamHost, LLC) would love to have it... one day. :-)

Links / references

Edited Aug 29, 2025 by 🤖 GitLab Bot 🤖
Assignee Loading
Time tracking Loading