Because our Secure jobs pass even when vulnerabilities are found, users may reference the pipeline view, see a green check, and assume they are secure when they are in fact not.
We have received this feedback multiple times that the current behavior is not intuitive. We also have heard the exact opposite feedback that we should continue passing jobs and that users should be required to reference the pipeline security tab.
User experience goal
Behavior more closely aligned with user expectations - that vulnerability findings should equal failing pipelines
Proposal
Conduct user research to (in)validate what end-users' perceptions are.
After researching and if the findings are that users are confused with our current approach, change the default behavior of security scanners to fail when vulnerability findings are found but keeping the allow_failure: true.
This use of allow_failure allows us to draw additional visibility to the results of the security job while not blocking the pipeline, which is one of our UX goals in Secure.
This was primarily based on the security paradigm (which is no longer in the handbook apparently).
Though, with the “allow_failure” option we use, it could still be possible to stay compliant with the “don’t block the pipeline” approach but return non-zero exit code when vulnerabilities are found. This makes sense for some people as usually scanners exit with non-zero when finding vulns, and often our analyzers “catch” that and return 0 instead.
There are several workflows to evaluate to make sure we’re taking the right decision (e.g. when scanner crashes, when analysis can’t be done, etc…).
I’d recommend sticking to existing convention and open an issue to re-evaluate it. Having inconsistent behavior between the tool can be misleading for users.
Another note: If we are returning allow_failure regularly then it will also obscure any actual execution failures, so something like a broken security scanner won't be obvious to users unless they dig into the job, they will just assume the failure is tied to some found vulnerabilities.
We've trained users to react to a failed job as "something broke, I need to address it so the pipeline can pass." The user will have to view every failed job to assess if there was an error in the scan or if there was a new finding.
so something like a broken security scanner won't be obvious to users unless they dig into the job, they will just assume the failure is tied to some found vulnerabilities.
Failing a security job is not an appropriate security control to stop high-risk vulnerabilities from entering the branch. That's where security approvers, and eventually, security gates come into play. I also worry that this will slow down development in feature branches and cause more harm than good. We also run the risk of over signaling the user and generating alert fatigue, which could have negative side effects.
Another point of hesitation; users should be triaging the results in the UI, not in the job page. The job output is not equipped for interaction, so this flow will confuse users if they act on the failed job and go to the job page. They'll encounter a dead-end and have to go back to the MR after determining a new finding triggered the failure where they will see the same finding information as well as the callout in the widget that new findings were detected.
A few questions and considerations:
If we fail a job, how would the user go about getting it into a passing state?
Would we fail a job on any new finding or only on findings of a certain severity threshold? We could detect a lot of vulnerabilities, and I worry that the user would have to fix or dismiss all of them to proceed with their task of merging the branch.
Are we comfortable going against the grain with this behavior? To my knowledge, we only fail jobs when errors occur?
Apologies if this feels like a lot of push back. I believe this solution (failing pipelines) is stemming from a general lack of security controls in the MR. Right now there is no way to prevent findings from making their way into the default branch aside from security approvers and internal processes. I think it would be best to consider this alongside other security gating features and determine which one will provide the users with the greatest benefit and experience, irrespective of how quickly we can implement it.
To better envision this change I'd recommend writing down all possible workflows (vulns found, error during the scan, error during build/preparation of the project, nothing detected, etc.) and how we want to handle them (fail job, fail pipeline, report errors/warnings in the UI, etc.).
I think (and hope) I've captured all potential error states we account for today in the MR redesign. Note, this didn't account for failing pipelines at the time it was designed so those cases aren't represented, but it might be useful to see how we intend on handling error cases in the new design.
if findings i think the job is a success and as such it should pass, but i also think we should work with UX to validate our patterns once we pick them. If users want to "block" code with findings they can/should do this using the security MR approvals mechanism IMO.
Note: perhaps in the secure configuration area and/or Merge request widget we may need to enhance UI to express this to users ("A successful Secure job means the scan was able to complete. It may or may not have resulted in findings. If you wish to block MRs from progressing when there are Secure findings, use the MR approvals feature).
Additional idea note / is there more options in the CI then fail pipeline on job failure? i.e. can we return a specific exit code which means complete, with findings, and allow users to key a pipeline failure to that specific code?
this is the scenario I'm most concerned about with this change. If a warning is expected for findings, any exceptions within our security scanners or breaking changes will be hidden from the user. There have been proposals to improve this visibility (example) but no progress has been made so far.
if a scan can not be completed / errors i think we should fail it, but default to "allowed to fail" in which case they show up as an exclamation mark in the pipeline, but the pipeline can continue as though they passed.
Lucas Charleschanged the descriptionCompare with previous version
changed the description
Lucas Charleschanged title from Consider failing secure jobs when vulnerabilities are found to Consider failing secure jobs when vulnerability findings are found
changed title from Consider failing secure jobs when vulnerabilities are found to Consider failing secure jobs when vulnerability findings are found
Lucas Charleschanged the descriptionCompare with previous version
changed the description
Sam Kerrchanged the descriptionCompare with previous version
@tlavi here is the issue I was talking about that we want to do some user testing on - both to validate that the problem either does or does not exist as well as to run some potential solutions by our users.
I updated the issue description to go out and get feedback from end-users. I think we've discussed this issue a number of times, but the discussion was with internal teams, rather than directly with end-users. We can use their responses to help us make the best changes (or make no change at all).
My thought is that this will come down to whether or not users actively look at the security tab by default or if they look at the pipeline tab, see a wall of green and say their good. Or if they don't even look at a pipeline tab at all and instead primarily interact with our MR widget for their results.
Is the meaning of a failed job a global and consistent definition across all jobs or something more contextual and that can be interpreted differently from one job to another?
If the former, this would help to shape our own decision here. If the latter, then it doesn't help
@gonzoyumo good point & another piece of data we should get. My understanding is that it's intended to be fairly similar at a global level, with green being all good, yellow as "ran but errors/warnings", and red as "total failure."
We also have heard the exact opposite feedback that we should continue passing jobs and that users should be required to reference the pipeline security tab.
To my understanding, this issue discusses both whether the problem exists (i.e. is the current situation unintuitive) as well as whether to go with the specific solution of failing the job. As for the first part, my usability study from last year indicates that the problem exists (or at least existed at the time). And so while I think it could be of some benefit to repeat that test (to reflect UI changes and include more external participants), I would recommend that the next study we conduct would primarily focus on exploring the proposed solution. Another option is to validate the proposed solution against a control group doing the same scenario with the current implementation.
Next steps
I'd suggest at this stage to open a new research issue in the UX Research repo and use the 'Solution Validation' issue template. Happy to jump on a call with you and Camellia to further flesh this out!
@stkerr Just touching base to ask whether you're still interested in research for this topic? If so, what milestone will you be aiming for?
I'm asking because I was asked by our research coordinators to let them know what recruiting jobs they will be asked to carry out in the upcoming milestones.
Adding my own experience from managing vulnerability life cycles in a previous job.
The pipeline would fail but the immediate action for the absolute majority of cases was to create an issue , allow-list the vulnerabilities, and merge the changes. In fact, in over 2 years I don't remember an author ever fixing security findings in the same MR.
It's probably worth exploring whether the expectation for the job (and pipeline) to fail can be replaced by mandatory security approvals.
@stkerr I think I'm a bit confused by the issue description and we might have 2 different things to look at:
the exit code of the analyzer when it finds vulnerabilities, which will make the job fail or not
the allow_failure option, which makes the pipeline failing or not, when the job is failing
The current implementation is:
exit successfully when vulnerabilities are found
allow_failure: true so that if anything breaks during security analysis, we do not block the pipeline (could be anything like setup issue, network error, etc.)
This makes the job looking in the pipeline view when vulnerabilities are found and when anything goes wrong, but without failing the pipeline.
My understanding is that here we're evaluating:
exit successfully when execution is fine and no vulnerabilities are found. Exit with an error when vulnerabilities are found, making the job itself failing.
allow_failure: false, so that if anything breaks or vulnerabilities are found, we fail the whole pipeline and block further processing (I think this also means next stages won't be executed, pipeline will halt there)
exit successfully when execution is fine and no vulnerabilities are found. Exit with an error when vulnerabilities are found, making the job itself failing.
Correct on this part - a SAST job for example would be when it found vulnerabilities while a fuzz testing job would be when it found vulnerabilities. Different ways of reporting essentially the same outcomes.
allow_failure: false, so that if anything breaks or vulnerabilities are found, we fail the whole pipeline and block further processing (I think this also means next stages won't be executed, pipeline will halt there)
This isn't what was intended, no - I don't think we're ready to start considering failing pipelines as a result of the security results. That was one of the aspects I was a huge fan of from the old Security Paradigm. If we were to introduce hard failures, that immediately makes security a blocker, rather than a helper for orgs.
Does that help? I'll try to clarify more in the description.
Correct on this part - a SAST job for example would be when it found vulnerabilities while a fuzz testing job would be when it found vulnerabilities. Different ways of reporting essentially the same outcomes.
@stkerr that sounds a bit odd to me but I have a natural bias for consistency
@theoretick Looks like this is very similar or the same as @fjdiaz's: #216176 (closed) I've linked the two but maybe worth closing one so there's a single place for discussion.
@matt_wilson thanks! I always prefer to go with the older issue as it's more likely to have subscribers and participation, but given this one has active assignees for %13.5 milestone and a recent slew of activity, I'll close out the other.
@bmiller1 Could you explain more this comment in this issue
Apparently, customers are begging us NOT to implement this
"this" means
Provide an option which user can control to fail jobs when vulnerabilities found
by default fail jobs when vulnerabilities found
I want to understand user think we shouldn't allow this "fail jobs when vulnerabilities found" possibility at all or user think we shouldn't enable this feature by default, but it is ok to have this option available.
Many of our customers come from Jenkins where builds are brittle and developers struggle to line up all the details required to make them work. They come to GitLab to get away from that user experience. Triggering a security approval for the MR is an optimal solution because it accomplishes the intended outcome (you cannot deliver code with vulnerabilities) but you are not blocked from getting your work done.
A build should not fail when a vulnerability is found. Why?
There may be more issues with the build which will not be exposed if the build stops immediately.
The developer may be focusing on some urgent code content and plans to deal with the vulnerability before delivering the code - failure will prevent that from occurring.
The developer may be testing workarounds for the vulnerability and if they are unable to complete the build and deploy to a review application or Docker image it could become a blocker actually resolving the vulnerability.
It is a DevSecOps anti-pattern and considered harmful.
Consider any build breaking that isn’t build failure or test case failure to be a process problem that will ruin your developers trust and collaboration with other teams
My favorite: Sam Kerr’s comment “I don’t think we’re ready to start considering failing pipelines as a result of the security results.”
To clarify, we're talking about the status of the individual jobs, not the whole pipeline. I still agree with your (and my) point #6 (closed) about not failing the whole pipeline
As an example, here are the three possible job states we could have
Today, some scanners use #1 (closed), some scanners use #2 (closed), and none use #3 (closed) (since it would fail the pipeline). We're trying to get the user expectation about if a scanner completed and found vulnerabilities, would they expect #1 (closed) or #2 (closed).
A Starter customer is interested in this feature (internal ticket). A Premium customer as well, at least when "high or critical vulnerabilities" are found (internal ticket).