pytest-cov recommended regex is not a valid regex under many regex engines, and can therefore cause gitlab-ci schema validation to fail
Hi there! check-jsonschema
maintainer here. I have a user report of an issue which traces back to GitLab docs. I think the appropriate fix here is just a docs adjustment.
The docs have examples of recommended regexes for the coverage
field for test reporting:
https://docs.gitlab.com/ee/ci/testing/code_coverage.html#test-coverage-examples
The one given for pytest-cov is not valid under most regex engines:
/(?i)total.*? (100(?:\.0+)?\%|[1-9]?\d(?:\.\d+)?\%)$/
The issue is the use of the (?i)
qualifier to make the expression case-insensitive.
Under most engines, that has to be positioned at the beginning of a string. In some languages, it's not valid at all.
For example, in Python and JavaScript, this pattern won't work.
Normally, this might not matter to a user -- GitLab runs the regex, not the user -- but here's where it becomes relevant that I'm seeing this through the lens of a JSON Schema checker. I'll explain:
The gitlab-ci schema requires that the regexes fit the pattern of /.+/
and are "format": "regex"
.
When trying to validate a user's .gitlab-ci.yaml
against the schema, that field is checked against the pattern (easy pass) and then the format. To check "format": "regex"
, the field needs to be compiled as a regex, and this is where things can fail.
JSON Schema itself specifies that "regex means ECMA Script RegExp", and recommends that schema authors use a restricted subset which is compatible with most languages' regex engines.
That said, the coverage
field in GitLab CI is one step removed from JSON Schema. And the recommended contents in the GitLab docs are another step removed!
Should the docs be bound by this constraint? Well, it depends. If it isn't, there are remedies, but I do think a docs change is the best solution.
Roughly, the options I see are:
- adjust the regexes in the docs to be valid in more engines (JavaScript compatibility preferable)
- adjust the schema for gitlab-ci to remove
"format": "regex"
(I can follow-up on this if it's what the GitLab team prefers) - tell users who are validating
.gitlab-ci.yaml
against the schema to disable the"regex"
format or disable all"format"
validation - do nothing
I believe that the following, slightly less pretty, regex would suffice to make this checkable in most engines:
/[Tt][Oo][Tt][Aa][Ll].*? (100(?:\\.0+)?\\%|[1-9]?\\d(?:\\.\\d+)?\\%)$/
I've checked it with Python and JS, and it seems fine.
Another option would be to use (Total|TOTAL|total)
(I don't know what spellings realistically exist for this field in the output. Current versions of coverage
output TOTAL
.)
A cursory glance at the other regexes suggests that they are fine. Only this one has the odd case insensitivity qualifier.