Because our Secure jobs pass even when vulnerabilities are found, users may reference the pipeline view, see a green check, and assume they are secure when they are in fact not.
We have received this feedback multiple times that the current behavior is not intuitive. We also have heard the exact opposite feedback that we should continue passing jobs and that users should be required to reference the pipeline security tab.
User experience goal
Behavior more closely aligned with user expectations - that vulnerability findings should equal failing pipelines
Proposal
Conduct user research to (in)validate what end-users' perceptions are.
After researching and if the findings are that users are confused with our current approach, change the default behavior of security scanners to fail when vulnerability findings are found but keeping the allow_failure: true.
This use of allow_failure allows us to draw additional visibility to the results of the security job while not blocking the pipeline, which is one of our UX goals in Secure.
This was primarily based on the security paradigm (which is no longer in the handbook apparently).
Though, with the “allow_failure” option we use, it could still be possible to stay compliant with the “don’t block the pipeline” approach but return non-zero exit code when vulnerabilities are found. This makes sense for some people as usually scanners exit with non-zero when finding vulns, and often our analyzers “catch” that and return 0 instead.
There are several workflows to evaluate to make sure we’re taking the right decision (e.g. when scanner crashes, when analysis can’t be done, etc…).
I’d recommend sticking to existing convention and open an issue to re-evaluate it. Having inconsistent behavior between the tool can be misleading for users.
Another note: If we are returning allow_failure regularly then it will also obscure any actual execution failures, so something like a broken security scanner won't be obvious to users unless they dig into the job, they will just assume the failure is tied to some found vulnerabilities.
We've trained users to react to a failed job as "something broke, I need to address it so the pipeline can pass." The user will have to view every failed job to assess if there was an error in the scan or if there was a new finding.
so something like a broken security scanner won't be obvious to users unless they dig into the job, they will just assume the failure is tied to some found vulnerabilities.
Failing a security job is not an appropriate security control to stop high-risk vulnerabilities from entering the branch. That's where security approvers, and eventually, security gates come into play. I also worry that this will slow down development in feature branches and cause more harm than good. We also run the risk of over signaling the user and generating alert fatigue, which could have negative side effects.
Another point of hesitation; users should be triaging the results in the UI, not in the job page. The job output is not equipped for interaction, so this flow will confuse users if they act on the failed job and go to the job page. They'll encounter a dead-end and have to go back to the MR after determining a new finding triggered the failure where they will see the same finding information as well as the callout in the widget that new findings were detected.
A few questions and considerations:
If we fail a job, how would the user go about getting it into a passing state?
Would we fail a job on any new finding or only on findings of a certain severity threshold? We could detect a lot of vulnerabilities, and I worry that the user would have to fix or dismiss all of them to proceed with their task of merging the branch.
Are we comfortable going against the grain with this behavior? To my knowledge, we only fail jobs when errors occur?
Apologies if this feels like a lot of push back. I believe this solution (failing pipelines) is stemming from a general lack of security controls in the MR. Right now there is no way to prevent findings from making their way into the default branch aside from security approvers and internal processes. I think it would be best to consider this alongside other security gating features and determine which one will provide the users with the greatest benefit and experience, irrespective of how quickly we can implement it.
To better envision this change I'd recommend writing down all possible workflows (vulns found, error during the scan, error during build/preparation of the project, nothing detected, etc.) and how we want to handle them (fail job, fail pipeline, report errors/warnings in the UI, etc.).
I think (and hope) I've captured all potential error states we account for today in the MR redesign. Note, this didn't account for failing pipelines at the time it was designed so those cases aren't represented, but it might be useful to see how we intend on handling error cases in the new design.
if findings i think the job is a success and as such it should pass, but i also think we should work with UX to validate our patterns once we pick them. If users want to "block" code with findings they can/should do this using the security MR approvals mechanism IMO.
Note: perhaps in the secure configuration area and/or Merge request widget we may need to enhance UI to express this to users ("A successful Secure job means the scan was able to complete. It may or may not have resulted in findings. If you wish to block MRs from progressing when there are Secure findings, use the MR approvals feature).
Additional idea note / is there more options in the CI then fail pipeline on job failure? i.e. can we return a specific exit code which means complete, with findings, and allow users to key a pipeline failure to that specific code?
this is the scenario I'm most concerned about with this change. If a warning is expected for findings, any exceptions within our security scanners or breaking changes will be hidden from the user. There have been proposals to improve this visibility (example) but no progress has been made so far.
if a scan can not be completed / errors i think we should fail it, but default to "allowed to fail" in which case they show up as an exclamation mark in the pipeline, but the pipeline can continue as though they passed.
Lucas Charleschanged the descriptionCompare with previous version
changed the description
Lucas Charleschanged title from Consider failing secure jobs when vulnerabilities are found to Consider failing secure jobs when vulnerability findings are found
changed title from Consider failing secure jobs when vulnerabilities are found to Consider failing secure jobs when vulnerability findings are found
Lucas Charleschanged the descriptionCompare with previous version
changed the description
Sam Kerrchanged the descriptionCompare with previous version
@tlavi here is the issue I was talking about that we want to do some user testing on - both to validate that the problem either does or does not exist as well as to run some potential solutions by our users.
I updated the issue description to go out and get feedback from end-users. I think we've discussed this issue a number of times, but the discussion was with internal teams, rather than directly with end-users. We can use their responses to help us make the best changes (or make no change at all).
My thought is that this will come down to whether or not users actively look at the security tab by default or if they look at the pipeline tab, see a wall of green and say their good. Or if they don't even look at a pipeline tab at all and instead primarily interact with our MR widget for their results.
Is the meaning of a failed job a global and consistent definition across all jobs or something more contextual and that can be interpreted differently from one job to another?
If the former, this would help to shape our own decision here. If the latter, then it doesn't help 😄
@gonzoyumo good point & another piece of data we should get. My understanding is that it's intended to be fairly similar at a global level, with green being all good, yellow as "ran but errors/warnings", and red as "total failure."
We also have heard the exact opposite feedback that we should continue passing jobs and that users should be required to reference the pipeline security tab.
To my understanding, this issue discusses both whether the problem exists (i.e. is the current situation unintuitive) as well as whether to go with the specific solution of failing the job. As for the first part, my usability study from last year indicates that the problem exists (or at least existed at the time). And so while I think it could be of some benefit to repeat that test (to reflect UI changes and include more external participants), I would recommend that the next study we conduct would primarily focus on exploring the proposed solution. Another option is to validate the proposed solution against a control group doing the same scenario with the current implementation.
Next steps
I'd suggest at this stage to open a new research issue in the UX Research repo and use the 'Solution Validation' issue template. Happy to jump on a call with you and Camellia to further flesh this out!
@stkerr Just touching base to ask whether you're still interested in research for this topic? If so, what milestone will you be aiming for?
I'm asking because I was asked by our research coordinators to let them know what recruiting jobs they will be asked to carry out in the upcoming milestones.
Adding my own experience from managing vulnerability life cycles in a previous job.
The pipeline would fail but the immediate action for the absolute majority of cases was to create an issue , allow-list the vulnerabilities, and merge the changes. In fact, in over 2 years I don't remember an author ever fixing security findings in the same MR.
It's probably worth exploring whether the expectation for the job (and pipeline) to fail can be replaced by mandatory security approvals.
@stkerr I think I'm a bit confused by the issue description and we might have 2 different things to look at:
the exit code of the analyzer when it finds vulnerabilities, which will make the job fail or not
the allow_failure option, which makes the pipeline failing or not, when the job is failing
The current implementation is:
exit successfully when vulnerabilities are found
allow_failure: true so that if anything breaks during security analysis, we do not block the pipeline (could be anything like setup issue, network error, etc.)
This makes the job looking ✅ in the pipeline view when vulnerabilities are found and ⚠ when anything goes wrong, but without failing the pipeline.
My understanding is that here we're evaluating:
exit successfully when execution is fine and no vulnerabilities are found. Exit with an error when vulnerabilities are found, making the job itself failing.
allow_failure: false, so that if anything breaks or vulnerabilities are found, we fail the whole pipeline and block further processing (I think this also means next stages won't be executed, pipeline will halt there)
exit successfully when execution is fine and no vulnerabilities are found. Exit with an error when vulnerabilities are found, making the job itself failing.
Correct on this part - a SAST job for example would be ✅ when it found vulnerabilities while a fuzz testing job would be ⚠ when it found vulnerabilities. Different ways of reporting essentially the same outcomes.
allow_failure: false, so that if anything breaks or vulnerabilities are found, we fail the whole pipeline and block further processing (I think this also means next stages won't be executed, pipeline will halt there)
This isn't what was intended, no - I don't think we're ready to start considering failing pipelines as a result of the security results. That was one of the aspects I was a huge fan of from the old Security Paradigm. If we were to introduce hard failures, that immediately makes security a blocker, rather than a helper for orgs.
Does that help? I'll try to clarify more in the description.
Correct on this part - a SAST job for example would be ✅ when it found vulnerabilities while a fuzz testing job would be ⚠ when it found vulnerabilities. Different ways of reporting essentially the same outcomes.
@stkerr that sounds a bit odd to me but I have a natural bias for consistency 😜
@theoretick Looks like this is very similar or the same as @fjdiaz's: #216176 (closed) I've linked the two but maybe worth closing one so there's a single place for discussion.
@matt_wilson thanks! I always prefer to go with the older issue as it's more likely to have subscribers and participation, but given this one has active assignees for %13.5 milestone and a recent slew of activity, I'll close out the other.
@bmiller1 Could you explain more this comment in this issue
Apparently, customers are begging us NOT to implement this
"this" means
Provide an option which user can control to fail jobs when vulnerabilities found
by default fail jobs when vulnerabilities found
I want to understand user think we shouldn't allow this "fail jobs when vulnerabilities found" possibility at all or user think we shouldn't enable this feature by default, but it is ok to have this option available.
Many of our customers come from Jenkins where builds are brittle and developers struggle to line up all the details required to make them work. They come to GitLab to get away from that user experience. Triggering a security approval for the MR is an optimal solution because it accomplishes the intended outcome (you cannot deliver code with vulnerabilities) but you are not blocked from getting your work done.
A build should not fail when a vulnerability is found. Why?
There may be more issues with the build which will not be exposed if the build stops immediately.
The developer may be focusing on some urgent code content and plans to deal with the vulnerability before delivering the code - failure will prevent that from occurring.
The developer may be testing workarounds for the vulnerability and if they are unable to complete the build and deploy to a review application or Docker image it could become a blocker actually resolving the vulnerability.
It is a DevSecOps anti-pattern and considered harmful.
Consider any build breaking that isn’t build failure or test case failure to be a process problem that will ruin your developers trust and collaboration with other teams
My favorite: Sam Kerr’s comment “I don’t think we’re ready to start considering failing pipelines as a result of the security results.” 😄
To clarify, we're talking about the status of the individual jobs, not the whole pipeline. I still agree with your (and my) point #6 (closed) about not failing the whole pipeline 🙂
As an example, here are the three possible job states we could have
Today, some scanners use #1 (closed), some scanners use #2 (closed), and none use #3 (closed) (since it would fail the pipeline). We're trying to get the user expectation about if a scanner completed and found vulnerabilities, would they expect #1 (closed) or #2 (closed).
A Starter customer is interested in this feature (internal ticket). A Premium customer as well, at least when "high or critical vulnerabilities" are found (internal ticket).
@stkerr To refresh my mind, we are talking about this workflow below in the image I attached. The confusing part is at this pipeline overview. It is not related with MR widget or other pages, right?
The main problem we are solving here is "For those who come to the pipeline overview and see a lot of green marks, they might think that security scans find no vulnerability"
The main problem we are solving here is "For those who come to the pipeline overview and see a lot of green marks, they might think that security scans find no vulnerability"
Correct!
There are a few examples that show the possible states that user may be able to see.
@stkerr I am thinking we can do a unmoderated test. Showing user the below screens, to see which one user perceive as the best option for them to inform: there is a vuln found in a job. And follow up question, do they prefer to fail the job or not, what is the default options for them.
All the screens here has the same scenario: they have 6 security jobs in a pipeline, all of them run successfully, 4 of them found vulns.
we show check marks (current)
we show exclamation mark but not fail the job
we show failed icon and fail the icon
What do you think?
If we think this is a good approach, I will create a new research issue and add more details
@cam.x I like the idea of unmoderated testing! That should allow us to get a broad range of users as well as be less time-intensive for coordinating participant interviews.
@stkerr Study launched with Usertesting.com panel, if we don't have enough users at the end of this week, I will create a new one next week to share on LinkedIn or other channels to recruit users to finish the test.
@tlavi This is like a survey like a test, if it is a pure survey where the user can't talk, I will do 60 as we generally do. Since the user is talking, I feel like that from their talk, I can decide the confidence level and then says that I need the rest 40 tests or not. Another reason is that if we do have another 40, I will change the order of the screen they see to be more balanced.
I have reviewed the results(Please see below). For the question in this issue, I got the answer like 17/18 out 18 consistency, which I think it shows enough confidence, let me know if you think otherwise, I can launch rest 40 tests.
Hi @cam.x - Great job on diving into the world of unmoderated testing! You'll quickly see how game-changing it can be (ex: getting insights back almost instantly). There are a couple of things I wanted to point out with this particular study:
Sample size - Typically, for studies in UserTesting, a sample size of 5-8 is ideal. Why? UserTesting is perfect for qualitative studies, where sample sizes are small, yet insights are extremely rich. We budgeted for these kinds of sample sizes being used across the UX team. A study that uses 20 (or 60!) will use up our units must faster than planned. If you feel you need a larger sample size, you should work with @tlavi to determine the best way to do that (ex: a larger N survey, supplemented with with a N=5-8 UserTesting study for that rich qual).
Understanding the why - UserTesting is absolutely ideal for understanding the 'WHY' behind something. If you were to launch a similar study in UserTesting, I'd avoid asking yes/no questions to start. Instead, I'd ask them to explain what they're seeing in the screenshot. I'd hope that they'd pick up on the element you want them to, just naturally. After that, I'd ask about the element, but not in a yes/no manner. I'd then incorporate a 'why they thought that' type question. The 'why' question is perhaps the most important question of all, since it's actionable.
I'm happy to go deeper into some examples if that helps. I'm a huge fan of UserTesting and have been using them for years. I've learned that it does require some adjustment compared to traditional moderated testing.
Thank you for your explanation. I totally agree that if it is qualitative the sample size is 5-8. I didn't know that we have units... I don't want to use it up for sure. Sorry about that.
After that, I'd ask about the element, but not in a yes/no manner. I'd then incorporate a 'why they thought that' type question. The 'why' question is perhaps the most important question of all since it's actionable.
Got it, I was thinking usertesting.com is somewhere between qualitative and quantitative and this first try to get a feeling. Now with this result and your explanation. It totally makes sense and I will remember always add the question "why"
For un-moderated, it somehow feels like between qualitative and quantitative. For this study, we have someone doesn't just click/click and don't talk, some feel like they didn't read the page I present, no scrolling etc, finish the task in 2 mins. For those, if I don't think they are qualified as a qualitative one, I should just run an additional test with more people after the first testing with the 5-9 sample group? I will consult @tlavi in those cases for sure! (Somehow using usertesting.com didn't trigger me to go the old moderated test process(create separate issues, recruiting issues etc, more like a quick win thingy I can just do it. 😅)
For this study, we have someone doesn't just click/click and don't talk, some feel like they didn't read the page I present, no scrolling etc, finish the task in 2 mins. For those, if I don't think they are qualified as a qualitative one, I should just run an additional test with more people after the first testing with the 5-9 sample group?
Great question, @cam.x! If participants are finishing up in about 2 mins, that is probably the result of the way the question is being asked. For example:
You asked this question: "As a person who looking for vulnerabilities/bugs/faults in committed code, you see the following screen. Do you think there is any bug found?"
this is a yes/no question, which participants will gladly answer quickly. You can reshape it to yield more data for yourself!
Instead, you may want to ask it this way: "As a person who looking for vulnerabilities/bugs/faults in committed code, you see the following screen. Talk me through what you're seeing."
there isn't a pass/fail with this task - but it does force the participant to look carefully at the screen and explain what they're seeing. It's also a bonus if they discover the area you're interested in on their own!
Then, you can ask: "Now, I'd like to draw your attention to this area, highlighted in blue. Can you talk through what these items are indicating to you?"
through this approach, you're not leading them into the 'bug found' mindset; they'll have to figure that out for themselves - just like in real life.
Finally, to get your metric, you can then ask your yes/no question: "Do you think there is any bug found?"
by putting this last, you now allowed yourself to get as much info as possible from your participant before pointing them to that area.
So, should you test more people? If you have the data you need, then probably not. I'm fairly confident that people answered the questions without rushing through the study and didn't just click whatever.
Oh - and I noticed you asked this question: "If you choose "other" in previous question, please explain here what do you want the tool to do when security jobs have found vulnerabilities; otherwise feel free to type "nothing" to skip this question"
I'd strongly suggest reshaping this. This is a great opportunity to expand the 'why did you select that' question to apply towards any answer they selected - not just 'other'.
As I mentioned earlier, creating fruitful unmoderated studies takes a little practice. Always happy to talk through it with you! (I like to geek out on this topic 🤓 )
Camellia X Yangchanged title from Consider failing secure jobs when vulnerability findings are found to 🎨 Design/Research: Consider failing secure jobs when vulnerability findings are found
changed title from Consider failing secure jobs when vulnerability findings are found to 🎨 Design/Research: Consider failing secure jobs when vulnerability findings are found
2 people of 20 didn't scroll the picture properly, only quickly answer the question, in the project I add note that those two are not qualified. Since I didn't figure out how to delete a user in usertesting.com, maybe I can't. I put the data into a stylesheet, please see the analytics there.
Q1: How do people understand the green checkmark in the context of pipeline view
16 out of 18 thought there is no vulnerability
1 out of 18 answered there are vulns with doubt of what does vuln mean, but he wants to change his answer after seeing the next screen;
1 out of 18 answers not sure without giving details
Q2: How do people understand the orange exclamation mark in the context of a pipeline view.
18 out of 18 think there are vulns found in those jobs
Q3: How do people understand the red error mark in the context of a pipeline view.
14 out 18 think there are vulns found in those jobs, most common answers are "The exclamation mark is a warning, here is definitely something happened" (1 user answered no vulns, but the user speaks as he thinks there is)
3 out of 18 answered not sure, they either think the job was not running or some other code error instead of vulns found
1 out of 18 think there are no vulns, the user thinks the job was not carried on
Q4: What do user think when there are vulnerabilities found? The job should pass? The job should fail
There is no winner here, most people think it shouldn't pass, then pass with a warning, less think it should pass normally.
Suggested next step:
In my option, the top 3 questions give a clear answer to the problem in this issue. We should use the exclamation icon which indicates user there is something found, otherwise, they will just assume everything is fine. Green check is pretty clear that it means good.
Shall we pass the job or not, it should be the user's choice, "Security Gate" feature should enable the user to make their choice. If we are questing what is the default feature for security gate feature, I think we should have more candidate like 60 to get more confident results. But this is out of the scope of this issue as discussed before, here.
@stkerr I would suggest having another implementation issue, in the pipeline page, show a warning icon when there is a security job found and a tooltip over the icon to explain what happened, like this below, what do you think?
@cam.x I think there's a typo in this bullet from Q3:
21 out of 18 think there are no vulns, the user thinks the job was not carried on
Is it 2 out of 18, 1 out of 18, or really 21 out of 18 (seems difficult since there is a total of 18 😉)
Also, I wonder if the solution is indeed to let the user choose what they want Jobs to do or if we should consider leaving Jobs alone and come up with another solution that is more clear and doesn't rely on a setting. Technically the Job passed, it ran all of the tests successfully. We want to communicate this incase some tests didn't get run successfully while also communicating that the successful job's test (i.e. a scan) found vulnerabilities.
All-in-all the research was quite helpful, our assumption has been validated, users are confused by our Jobs status icons when attempting to combine their meaning with a Job's test results. Nice job @cam.x!
Wow! 👏 Amazing 👏 work @cam.x I agree with all of your suggestions! I really like using the (!) to alert users that vulns were found as that icon means "Passed with warnings" which makes sense in this case.
As for failing jobs, I think you are getting closer to the core of the problem with this study. The security job is the mechanism that allows us to detect vulnerabilities in the pipeline, but it shouldn't be the controlling mechanism for blocking the MR (i.e. Security Gates). Those two things should operate independently of one another. We don't want to block the pipeline from merging by failing the job, we want to block the entire MR from merging by disallowing the users from merging it. We need the compiled results of all the security jobs to make this determination in the MR and we need passing jobs to create the report to compile with the other scanners.
If we want to disallow the MR from merging then Security Gates are our best tool to do so. Users want to configure conditions that when met, will disallow the MR until the vulnerability is removed. Those conditions can be; severity, CWE-type, CVE, or eventually OWASP-10_Type. Failing the job because a vuln was found isn't practical, there are too many and often they are not severe enough to stop a merge. Setting these conditions for every security job in the pipeline isn't practical either, It's too much to manage. Especially when App Sec would be making these determinations and they are often outnumbered 20:1 or 50:1 in some organizations.
All that said, great work on the study @cam.x still lots to understand but we are closer than we were!
Document this in Dovetail (I'm not seeing it there, perhaps I missed it?)
Sounds like you have an actionable insight here. Could you please document it as a separate issue and label it accordingly, as per the handbook?
Please create a research issue in the research repo for this study and label it as Solution Validation? This is required so that your study will be counted towards this departmental KPI. The research issue's description could be extremely minimal (it can simply refer to this issue), and you can close it immediately.
i think we are finally in a position (IMO) to try and write a proposed for for all secure *maybe also protect) to follow by default and have specific recommended "if you don't like that" second choices.
My personal preference based on all of the above but would love to talk it out
do not fail pipeline by default for failures or wanrings
job pass (green) if runs, and no new findings
job fail w/warnings (!) if runs and finds new findings [i.e. allowed to fail is default for secure jobs and we can give a SPECIFIC error code that the "error" is "new findings"]
job fail w/warnings (!) if runs and finds technical concerns, but is able to run [i.e. we pass some kind of error code or warning that says warning you are using python 2 stop that! or something else]
job fail (x) if can not run
Recommended customer action
we recommend customers enable MR approvals to catch new errors and prevent their merging instead of failing pipeline as that would prevent other jobs (which could be useful like quality, code coverage, etc) from running
we recommend customers consider keeping the not failing the pipeline on failed job if the project is low risk and has scheduled pipelines, if it does not fit that criteria they should consider failing the pipeline on job failure.
Optional customer actions
we inform customers they may choose to fail a pipeline on error code "new findings" if desired, although we recommend mr approvals, and do do that change X to Y in the pipeline config
failing pipeline on fail
currently i have resisted failing the job b/c by fault i can not tell failed b/c errors or failed b/c findings but we can use the new codes for this i believe
The one catch to work through, at least for me, i think, is if we at that moment know new vs old - which i think we know but i want to have someone confirm i am understanding that when the job is done we know, and it's not a post process that takes extra time after job run.
If i am misunderstanding the new error options, please let me know!
Thank you for all ideas. I feel like inspired again! For the user, one word: "Fail" could be too many possibilities. The reason is not always related to security jobs. What is clear is that code is not merged. So besides the communicate that the code is not merged, we also want to explicitly say why. So I updated the chart
Note: when I say " security gate enabled" in the chart, I exemplify that user chooses to block the pipeline
Note 2: When there is an error, I thought error might be more important than the security gate block(for the developer, the error is something need to be fixed first), so the error is a bigger icon. I am fine swap them, like this: if we have reasons and think the security block should be more important than error
Note 3: icon can be fine-tuned if we like the concept. Currently those just rough illustrations
Note 4: I am not discussing default behaviour here, if we won't the default to be "Not merged", we can do that. Here is focus more on the communication part. what icon/message to communicate what
@cam.x I think you should connect with the designers in ~"devops::release" or devopsverify, whoever does the most work on the pipelines page and widgets. They will be able to help you ideate on a solution that will not conflict with existing designs or upcoming plans.
Have you considered what will trigger the blocked pipeline yet?
I think you should connect with the designers in ~"devops::release" or devopsverify, whoever does the most work on the pipelines page and widgets. They will be able to help you ideate on a solution that will not conflict with existing designs or upcoming plans.
Yup, yup! definitely will go over with them!
Have you considered what will trigger the blocked pipeline yet?
A security gate will trigger it, which means the user chooses to block pipeline when vulns found consciously and they set up what is criteria like you mentioned, they can decide severity level, CVE etc.
@NicoleSchwartz was suggesting that we should fail a job (I rephrase here to block job) when NEW vulns found and in settings allowed to fail is the default for secure jobs. I think it could also trigger it.
Basically, I am making up a new word (or packaging) around the old concept: fail because of a security risk (either security gate feature enabled or allowed to fail is set), so pipeline won't be merged automatically.
We can create a separate issue, if there no existing one, talk about the security gate related topic, default behaviours, or discoverability, also related to the points what @NicoleSchwartz says about_ Recommended/optional customer action_ . By the way, when I say the security gate, it equals security approvals. I thought they are the same thing, let me know if they are different. :)
@cam.x my concern with the updated chart above - security gates can/will be circumvented if the pipeline does not complete. a pipeline can finish successfully, but then not be merged, those are two distinct things, and IMO security gates can not / should not be reflected in the Job nor Pipeline Status
was suggesting that we should fail a job (I rephrase here to block job) when NEW vulns found and in settings allowed to fail is the default for secure jobs. I think it could also trigger it.
We need to, IMO, use the same precise terminology the pipeline team has given us to avoid confusion
wen set the pipeline to allow_failure by default for secure jobs that fail with a specific error code.
so another words the pipeline passes despite a job failing.
Basically, I am making up a new word (or packaging) around the old concept: fail because of a security risk (either security gate feature enabled or allowed to fail is set), so pipeline won't be merged automatically.
That is a separate concept, in which case we need 4 layers of data instead of 3
job with success or failure and error codes
pipeline success (including success with jobs failing) or failure
merge request approval (a separate item that is a workflow rule NOT a status of a job or pipeline that happens after pipeline and before merge)
automatic merging (new item you mentioned) being disabled or not and possibly. it seem you wish to add extra rules to stop it?
I don't think you need to prevent automatic merging from stopping. why?
if one uses security gates, that will prevent a merge from occuring
if one fails the pipeline b/c they chose to fail pipeline on job failure, auto merge will not occur
So i am standing firm with my assessment as of now, again willing to change my mind of course
Preferred
allow failure on all, and enable MR approvals
Job Status
Job allow_failure and Error Code
Pipeline Setting
Pipeline Status
Note
Pass pass ✅
n/a
allow_failure
pass ✅
no new vulns
Fail X
allow_failureexit_codes:2 red-X
allow_failure
pass ⚠
new vulns, would be stopped from merging with MR approvals
Fail X
allow_failureexit_codes:1` red-X
allow_failure
pass ⚠
default to pass IMO
note: IMO we would need to work with pipeline team and other PM teams if there is a specific pattern other than what we know of already for the symbols and how to get the specific symbols as we may need to work with them for fail w/specific error code to get it's own symbol
Suggestions to customer, if it works better for them
remove "allow_failure" on "exit_code" 1 and pipeline "fail" on job fail(MR approve irrelevant)
and/or remove "allow failure" "2" for when new vulns (MR approavls would never be triggered) and set pipeline to fail on job-fail
To answer my above query it looks like pipeline team has a recommend design pattern right now
When allow_failure is set to true and the job fails, the job shows an orange warning in the UI. However, the logical flow of the pipeline considers the job a success/passed, and is not blocked.
Assuming all other jobs are successful, the job’s stage and its pipeline show the same orange warning. However, the associated commit is marked as “passed”, without warnings.
so i'm hard on board with Cam's finding that we MUST change all of secure to fail the jobs with a specific error code so as to use the orange warning asap, but i feel like tweaking icons past that should be coordinated with pipeline?
@NicoleSchwartz Thank you so much for the details explanations. It makes realised what I proposed are not just packaging changes, it involves more underlying tech/structure changes.
I have created 2 more issues to future discuss some other questions brought up during the thread here:
For this issue, I agree with you, we should focus on change all of secure to fail the jobs with a specific error code so as to use the orange warning asap under current feature/tech structures. Tow things I see we can do:
1.And addition change in doc description: https://docs.gitlab.com/ee/ci/yaml/#allow_failureexit_codes
2.Update job/pipeline icons with a tooltip. Based on your previous chart, I update the icons and add a situation when new vulns found but allow_failure: false. I think in this case we should warn the user as well, they might not found out the feature of allow_failure, but they still want to know is the vulns found or not.
Job Status
Job allow_failure and Error Code
Job Setting
Pipeline Status
Note
n/a
allow_failure: false or allow_failure: true
no new vulns
n/a
allow_failure: false
new vulns found
allow_failureexit_codes:2 red-X
allow_failure: ture
new vulns, would be stopped from merging with MR approvals
Have you considered what will trigger the blocked pipeline yet?
A security gate will trigger it, which means the user chooses to block pipeline when vulns found consciously and they set up what is criteria like you mentioned, they can decide severity level, CVE etc.
@andyvolpe No, this issue is limited to update the icon reflect the right status with Exit code. I even considering creating a new one and close this one completely, since this one is too much overloaded. (todo tomorrow)
Another you asked in our meeting about user flow. I talked with CI/CD before, people who use the pipeline overview page is not security-related, their persona has no overlapping with us, and in their researches, they barely heard anyone mentioned security. So I wouldn't suspect this icon change will affect any workflow related thingy. The update supposes to clarify the communication in case people look at those pages and get confused. The normal flow for devs should still be MR page, for security people, should still be Dashboard.
In the new issue, if we want to introduce new icons status like what I proposed in the previous comment, with a block status, then we need to talk with CI/CD again and think heuristic again.
@cam.x thank you for clarifying! that's very helpful.
Another you asked in our meeting about user flow. I talked with CI/CD before, people who use the pipeline overview page is not security-related, their persona has no overlapping with us, and in their researches, they barely heard anyone mentioned security.
I might be a little confused here with this new information. If Security professionals are not typically going to the pipeline page then would we want to run the study again with a different audience? If Persona: Security Analyst are not familiar with our pipeline statuses and meanings then they would be confused if they were shown a pipeline page and asked if there were vulnerabilities present, or at least that is my hypothesis. Validating this problem with the primary pipeline user might be a safe approach as they may be most impacted by this change.
So I wouldn't suspect this icon change will affect any workflow related thingy. The update supposes to clarify the communication in case people look at those pages and get confused. The normal flow for devs should still be MR page, for security people, should still be Dashboard.
Sorry if my original question wasn't articulated well during our meeting. When I mentioned workflow, I meant any user's workflow, not just the ones we design for in Secure. As you stated this change would have a low impact on the Security user but it might have a high impact on Persona: DevOps Engineer or Persona: Release Manager ~Persona. Any solution in the new design issues should take this into account IMO.
@andyvolpe Thx for the discussion, some thoughts I have
If Persona: Security Analyst are not familiar with our pipeline statuses and meanings then they would be confused if they were shown a pipeline page and asked if there were vulnerabilities present, or at least that is my hypothesis.
I don't think devs/security specialist are not familiar with pipeline page, but I don't think this is their main task and part of their necessary workflow. They understand the concept of the pipeline, and what is the status, each status mean. But their task is not to come to check all the pipeline jobs as a release manager does. From our studies, we usually see devs start with MR page for security-related jobs. Security analysts' starting point could be dashboard or MR pages. So I would sum it like: pipeline page is their "Necessary" workflow page but could be something they run into or check when it is needed. With this condition, I think we still need to fix those "confusing" part on this page even if this, not a major block for them to accomplish their job/task. And we don't need to worry about changing job status icon will change the user's workflow. Hope this explains more now :)
As you stated this change would have a low impact on the Security user but it might have a high impact on Persona: DevOps Engineer or Persona: Release Manager~Persona. Any solution in the new design issues should take this into account IMO.
I tried to move this issue forward by narrowing down the scoping, otherwise, I am afraid we will keep discussing and the potential problems will be added to this issue even more. So after talking @stkerr, we think for now, we should focus and fixed the problem for "Security" persona, like Security Analyst and Devs, first, at least for this issue and for fuzzing groupsprioritization.
I agree that for other Persona: DevOps Engineer or Persona: Release Manager, we should have dig into more. So I created a new issue: #300544. I am not sure this will be high impact on them though since we are NOT going to introduce new status, just use existing ones "fail with exit code" or "pass with warning". But there is a possibility by the changes, those two personas might start paying attention to security-related topics. So let's keep eye on it and work with the new issue? Plus I will heads up with CI/CD team with the issue: Update secure job status with corresponding exit code with correct icon after we have a clear action plan. Does this makes sense to you?
I think we still need to fix those "confusing" part on this page even if this, not a major block for them to accomplish their job/task. And we don't need to worry about changing job status icon will change the user's workflow. Hope this explains more now :)
It does make sense but I disagree about not having to worry about the user's workflow. We need to consider both the signal and the action, or we run the risk of confusing our users or worse, creating an unnecessary feature.
A few questions to consider: If we signal there is a security flaw present in the pipeline by using a pipeline status icon, then what do we expect the user to do next when they are on the pipeline page? What are the steps they would take and how are we guiding them through their process?
Since we are introducing the concept that security flaws can trigger failures or warnings in the pipeline, then we need to keep the primary users in mind. They won't be used to seeing these statuses and will have to change their behavior when investigating the problem. My original worry about this feature is that we will fatigue users to the warning and failure status when they are decoupled from a functional error the tool encountered outside of its normal operating method. Every failure will be investigated, and the primary user's job is to make sure the right people are aware of the problems and address them as needed to get the pipeline in a passing state. We should be considerate of when and where these interactions between users are happening and how to best facilitate them in the application.
All of this is to say that this may seem like a simple icon and status change but it's much bigger than that when considering a cohesive end-to-end experience.
@andyvolpe I strongly agree that we should "worry about the user's workflow", my point more about icon signaling wrong message, and we shouldn't stop correct ourselves.
If we signal there is a security flaw present in the pipeline by using a pipeline status icon, then what do we expect the user to do next when they are on the pipeline page? What are the steps they would take and how are we guiding them through their process?
From my observation in CMS test I did before, they went to security dashboard for details. Again, I think we should worry about flow, and improve on it with iterative process. Thank you for putting together the comment:#300135 (comment 501590540) Let move the discussion over there.
Document this in Dovetail (I'm not seeing it there, perhaps I missed it?)
I didn't create one, I thought usertesting.com is replacing Dovetail when doing an unmoderated. Shall I download all video and move it there? or I just create insights there? I also have the clips and note in usertesting.com. I thought it is pretty convenient to keep them there.
Sounds like you have an actionable insight here. Could you please document it as a separate issue and label it accordingly, as per the handbook?
Actually, this issue itself is the main actionable insights. We will take the action on this issue :D. I added UX insight label!
Please create a research issue in the research repo for this study and label it as Solution Validation? This is required so that your study will be counted towards this departmental KPI. The research issue's description could be extremely minimal (it can simply refer to this issue), and you can close it immediately.
I am closing this issue since this grows too fast and it diverse a lot. I have created a new issue to directly act on the problem spotted by the research: