Ci::Build#group_name incorrectly parses job names ending in digits

What is the current behavior?

We rightly use Ci::Build#group_name in TestSuite (good looks @morefice), but there's a bug in the implementation of group name, for build groups that end in a digit:

  def group_name
    # 'rspec:linux: 1/10' => 'rspec:linux'
    common_name = name.to_s.gsub(%r{\d+[\s:\/\\]+\d+\s*}, '')

Here, we match one or more digit, followed by one or more separator , :, /, or \, followed by one or more digit, followed by any amount of whitespace.

rspec unit pg9 1/10

the string 9 1 is matched and subbed with an empty string, and we continue (#gsub) looking for matches on

rspec unit pg/10

which has no match against our pattern anymore, because there's no digit in front of the divider. This is what winds up in our test summary.

The good news is that the jobs that exhibit this problem all have the same denominator leftover in the group_name, so they are aggregating correctly in the test summary. They just have a slightly uglier name.

In a recent pipeline, the following configuration:

.rspec-unit-parallel:
  parallel: 20

rspec unit pg11:
  extends:
    - .rspec-base-pg11
    - .rails:rules:ee-and-foss-unit
    - .rspec-unit-parallel

created the following jobs:

which displays the incorrectly formatted group_name, the same display used by our Test Report Summary:

It's not just us mangling our own X/Y suffix either; we actually removed "11" from the end of the job name. It's written that way to specify that the build is run using PostgreSQL 11, and we truncate that information.

What is the expected behavior?

We should expect that none of the parallelization details show up on the end of the job name. The pattern we use appears to be intended to work that way, but doesn't only doesn't because it overlaps with another pattern that's subbed first.

Small Proposal

Off the top of my head, we could set the middle separator followed by digit as a match group, and allow it to be matched repeatedly, while maintaining that the whole pattern begins with a digit and ends in a digit optionally followed by any amount of whitespace. i.e.

%r{\d+[\s:\/\\]+\d+\s*}

becoming

%r{\d+([\s:\/\\]+\d+)+\s*}

I don't know that it's the best option, but it's an acute fix for our own job group names.

Large Proposal

We could stop relying on a magic regex, because we don't really know how any given CI project is going to name its jobs, and subbing on any pattern at all makes assumptions that won't necessarily hold true. We should consider figuring out how to remember the original job names from before we parallelized and renamed them.

Associated Risks

Changing group name could affect a wide array of jobs across GitLab installations, and if users are using group name on builds that have similar patterns over the API for example, this could potentially break something. I have not done any investigation into the scope of this risk.

Original Issue writeup

Summary

When tests in a single suite are parallelized, each parallel job presents as it's own suite in the merge request widget. This creates a a ton of noise in the widget that provides no added value.

Steps to reproduce

Run a full pipeline on the gitlab-org/gitlab project with a failing RSpec test and expand the merge request widget.

Example Project

!25256 (merged)

What is the current bug behavior?

Multiple parallel RSpec jobs are listed as separate suites named rspec/1, rspec/2, rspec/3, ...

This is especially confusing with our own job names like rspec unit pg9 and rspec unit pg/24

What is the expected correct behavior?

A single test suite summary should be gathered into a single widget row regardless of parallelization. Parallelization is an implementation detail that the spec summary shouldn't be concerned with at all.

Clicking on a particular error link in the summary should still connect to the underlying /# job.

Relevant logs and/or screenshots

Output of checks

This bug happens on GitLab.com

Possible fixes

We could detect parallelized jobs and detach the /# suffixes from the suites when we gather the artifacts from jobs.
We could defer on this until #37725 (closed) is closed and we're no longer fetching artifact data directly from the artifacts themselves. It's also possible that fixing TestReports and TestSuite to aggregate parallelized job reports will be a step on the way to PipelineMetrics, but I don't know yet.

Rollout via Feature Flag

#322080 (closed)

Edited Feb 18, 2021 by drew stachon