Currently, we're parsing CI job artifacts and store reports only after Pipeline is completed, we're doing it even if Pipeline is failed.
This has some disadvantages:
We have to deal with all reports in one place, which makes it harder to optimize the code.
It prevents us to show Users finding earlier in the CI cycle.
It prevents us to show Users findings from finished CI jobs if one of the builds is failed.
Taking all of this into account, we can split parsing and do it earlier - after CI job is finished. This is also not the perfect solution since we had to check if the build has a security artifact or not every time (and there could be a lot of builds even in one pipeline).
Tetiana Chuprynachanged title from Parse Security and License Compliance reports after build is finished, not in the end of Pipeline to Parse Security and License Compliance reports after CI job is finished, not when Pipeline is complited
changed title from Parse Security and License Compliance reports after build is finished, not in the end of Pipeline to Parse Security and License Compliance reports after CI job is finished, not when Pipeline is complited
Tetiana Chuprynachanged title from Parse Security and License Compliance reports after CI job is finished, not when Pipeline is complited to Parse Security and License Compliance reports after CI job is finished, not when Pipeline is completed
changed title from Parse Security and License Compliance reports after CI job is finished, not when Pipeline is complited to Parse Security and License Compliance reports after CI job is finished, not when Pipeline is completed
Note that if a pipeline does not complete, the SAST report is not picked up. DAST however will be picked up regardless if the pipeline completes or not.
DAST however will be picked up regardless if the pipeline completes or not.
I kindly disagree The logic to parse reported vulnerabilities and store them in DB (default branch only) to show up on the dashboards is common between all report types. Same logic applies to the pipeline and MR views.
The behavior you've encountered with DAST probably comes from other factors to me.
@tmccaslinhttps://gitlab.com/gitlab-com/account-management/pre-sales/ssctechnologies/-/issues/5#note_300627155 is exactly the issue this change is supposed to fix. If we'll parse reports at the end of the job, not pipeline, we'll get results for successful jobs inside of the failing pipeline. So if we have a pipeline with SAST and DAST jobs and SAST is successful, but DAST is failing, found vulnerabilities in SAST job will be visible on the Security Dashboard.
I think the suggestion to parse Security Reports after a job is run is very sensible. Why wait till the end of a pipeline to update vulnerabilities on the Security Dashboard? Security::Scans are created at the end of each job.
I wonder how this even works now. If there is a manual step in the pipeline, is the pipeline ever considered "complete"?
The main concern is the aggregation/dedupe logic since multiple jobs can run for the same report type and there was no clean way to know e.g. "are all jobs for SAST done?".
Also when implementing more complex tracking logic like #7586 (closed) it might cause edge cases to process jobs independently.
These are off the top of my head and may need to be re-evaluated though.
I wonder how this even works now. If there is a manual step in the pipeline, is the pipeline ever considered "complete"?
The main concern is the aggregation/dedupe logic since multiple jobs can run for the same report type
Cheers for sharing, I understand now. Do you think it would work if we converted the aggregation logic to be a view concern? We can always recreate an aggregation from individual components, but we can't always recreate individual components from an aggregation.
The worker that stores vulnerabilities in DB is triggered when entering any of the completed statuses...won't store vulns in DB.
Well I guess the Security Dashboard on DAST is broken then! Seriously though, we should fix this. There are legitimate reasons to do this.
Cheers for sharing, I understand now. Do you think it would work if we converted the aggregation logic to be a view concern? We can always recreate an aggregation from individual components, but we can't always recreate individual components from an aggregation.
This partial parsing approach could only be done for visualization of the current state (e.g. pipeline view), but this will create problems if we process them to e.g. store vulnerabilities in the DB or compare with another branch on an MR view.
The reason is we compare what we've found with what we had before, considering any differences as fixed vulnerabilities (or new ones)... which is not really true since they could come from another still-running report. This could be partially improved though by being more granular in the processing and e.g. do it per report type, but this again requires to be able to know if we have completed all jobs that we expect to produce a given report type artifact. So far, it was way easier to just wait for the pipeline to finish. But AFAIR we made recent progress in this area so there is hope :)
Storing all and dedupe/aggregate at query time was already discussed with @fcatteau but currently, the unicity logic at the DB level prevents from doing that at query time. One reason I'm against this is it makes it more complicated for any reading task like e.g. counting vulns and I don't even think about standalone vulnerability objects. Also, the tracking logic like #7586 (closed) can't be achieved at reading time, so we need to do all that processing when parsing/storing to me. This is an interesting topic though and worth having people digging into it.
Well I guess the Security Dashboard on DAST is broken then! Seriously though, we should fix this. There are legitimate reasons to do this.
This is actually debatable. Why considering vulns reported on a failing build that would not ship anyway? There are many possible interpretations of what a failing pipeline could be. This also could cause issues as mentioned above if for whatever reason a failing build would make a security scan to report 0 vulnerabilities. This would consider all existing vulns as fixed whereas it was just a detection issue due to this failing build. Though we're still currently in an odd situation when one of the security scans is failing. As it's configured with allow_failure: true, it won't make the pipeline failing, so we're probably considering previous vulns as fixed here.
Fair enough, I see that there are good reasons not to aggregate on view. I agree that it's worth people digging into this, because I think what we have right now is broken.
As an example, this pipeline in DAST never completed, but did not fail. We just didn't want to release. We should still see vulnerabilities in our Security Dashboard.
If I'm doing this, then I would bet my that our customers are doing it too. As an example, having manual jobs in CI is a useful way to build an audited approval model (e.g. #15819 (closed)). Just because a Job in a Pipeline wasn't approved until the next run on master takes place, doesn't mean that vulnerabilities should be hidden on the Security Dashboard.
I appreciate that this is a hard problem to solve.
Actually, if someone is using manual jobs to halt the pipeline until a particular person approves, maybe seeing new vulnerabilities would be a prerequisite to their approval
If a pipeline completes all jobs, where the last job in a pipeline is manually triggered, and that job has not been triggered, then the pipeline has a status of success. Security Reports get parsed:
If the manual job is then triggered and finishes successfully, the pipeline transitions status from success to success. Security Reports get parsed again:
Parsing reports multiple times is something to be wary of, and while worth optimizing, is not as bad as what I thought was happening (i.e. not parsing at all).
I propose that we break this up into a few smaller pieces:
update the license scanning merge request approval status at the end of the license scanning job.
update the other security reports at the end of their respective jobs.
I think it's safe to say that license scanning is currently not coupled to the other reports so this is something that is safe to move forward on. The other security reports can be investigated separately to see if it's practical to move forward on or not. I admittedly have less knowledge on the coupling between the other security reports.
By splitting the work into more focused pieces we can identify any blockers that are hard to predict before starting the work and we can make progress with reports that are safer to update today. WDYT?