Add scan.messages to the JSON security reports generated by SAST, Container Scanning, and Dependency Scanning analyzers.
Depending on how the analyzer project is implemented, the change has to be implemented in the command package of the common library or in the analyzer project itself. Currently the following analyzer projects don't use the command package: nodejs-scan, secrets, gemnasium, and klar.
Implementation plan
Define what needs to be logged in the report
Support 1 project that do NOT use command.Run (klar)
Update the common library. Add a new logger analyzer projects might use to add log messages to the report, and make this logger persist log messages in memory. Add struct types and helper functions analyzer projects can use to dump the collected log messages to the report
@caneldem I need this implementation plan to be reviewed first. There are technical details it doesn't address, and this might impact the weight of the issue. I'll keep you posted.
Update the logger defined in common so that it collects messages, using the newly introduced issue.Message Go struct. These messages are ready to be encoded to JSON, as part of the JSON report.
Alternatively, we could use a specific logger for what needs to be collected in the security report (using logrus or not). That would be more flexible and would give us full control over what ends up in the report, but at the cost of adding extra code, and possibly duplicating what's for the default logger. Also, I believe this is error prone, and we're likely to forget about pushing log messages to the report. WDYT? cc @theoretick@cam_swords
We can maintain multiple instances of logrus for different log outputs
Logrus does state that it's pluggable which may mean it's extensible enough to allow us to do what we need
@theoretick and @zrice likely also have other ideas on how we can solve this with logrus. That being said, if we determine another library is better, than I'm open to switching
Do you know what kind of information you want to log? I think we should be considered about what we choose to log, else the Secure reports will get rather large rather quickly.
This was a technical decision to include messages in our reports (i.e. driven by us). Do you know if the threat insights team know about the feature? It might be worth UX thinking about what they want to do with the information (if anything).
the analyzer settings as info and debug messages, until it possibly ends up in structured JSON fields
what's being skipped or not supported, possibly resulting in false-negatives
and error messages, causing the scan to fail
And yes, it would make sense to use a different logger, if thought I'm a bit concerned about redundancies and maintenance. In the end, it all depends on what we agree we should log, we should be discussed at the stage level, for consistency. cc @gonzoyumo
@gonzoyumo Before adding .scan.messages[] to SAST, CS, and DS reports (what this issue is about), we first need to agree on what needs to be logged, and how these logs are going to be used. Do we need a new issue for that? How should we proceed? cc @cam_swords
By the way, the documentation would be updated as part of this other issue, especially the Secure scanner integration guide.
I agree, and this is why I prefer having needs defined by product/UX first. If we don't know what to put there, why adding theses fields to the schema in the first place?
Also, we already have the job log output so we should avoid redundancy and limit the usage of this property to what can be used to add value to the UX, which to me is essentially about reporting warnings and errors in the UI.
If we don't know what to put there, why adding theses fields to the schema in the first place?
Exactly. I guess we should have discussed long ago, when working on #37123 (closed) (addition to the JSON schema).
For the background story, this very issue was created when splitting up #202053 (closed) into three 3 issues. #202053 (closed) initially was about supporting all the fields added to the JSON schema in #37123 (closed), but I narrowed the scope (it's now about supporting scanner and report_type), and created two more issue, that is #217790 (closed) (supporting start_time and end_time) and #217791 (closed) (supporting messages, this very issue).
@gonzoyumo AFAIR the initial intent behind this new .scan.messages field is to report what has been skipped during the scan, possibly resulting in false positives. This includes unsupported dependency files. See #11635
Should any of the following log messages be captured in scan.messages?
[WARN] [klar] [2020-08-05T12:41:34+10:00] ▶ Allowlist file with path 'clair-whitelist.yml' does not exist, skipping[WARN] [klar] [2020-08-05T12:41:34+10:00] ▶ Allowlist file with path 'vulnerability-allowlist.yml' does not exist, skipping[INFO] [klar] [2020-08-05T12:41:34+10:00] ▶ DOCKER_USER and DOCKER_PASSWORD environment variables have not been configured. Defaulting to DOCKER_USER=$CI_REGISTRY_USER and DOCKER_PASSWORD=$CI_REGISTRY_PASSWORD[WARN] [klar] [2020-08-05T12:41:34+10:00] ▶ Not starting Clair server process.[INFO] [klar] [2020-08-05T12:41:34+10:00] ▶ Scanning container from registry 'alpine:3.7.3' for vulnerabilities with severity level 'Unknown' or higher with klar '' and clair ''[INFO] [klar] [2020-08-05T12:41:37+10:00] ▶ Found Dockerfile with path: 'Dockerfile', using for remediations.[WARN] [klar] [2020-08-05T12:41:37+10:00] ▶ Image [alpine:3.7.3] contains 1 total vulnerabilities[WARN] [klar] [2020-08-05T12:41:37+10:00] ▶ Image [alpine:3.7.3] contains 1 remediations[ERRO] [klar] [2020-08-05T12:41:37+10:00] ▶ Image [alpine:3.7.3] contains 1 unapproved vulnerabilities
@adamcohen Could you have a look at the implementation plan before I create the issues? I suggest we start off with the analyzers that don't depend on command.Run, like klar. This way we can have two phases, and each one weights 5. There's no distinction b/w the analyzers maintained by groupcomposition analysis and the ones maintained by groupstatic analysis. WDYT? cc @gonzoyumo
@fcatteau The implementation plan looks good, although I think we should follow the same approach as with Add scanner, report_type to SAST, CS, DS reports and initially create MRs in both klar and bundler-audit at the same time because that will give us a good idea of how the additions to the common package will work with both projects.
@adamcohen I agree but creating MRs in both klar (doesn't depend on command.Run) and bundler-audit (depends on command.Run) makes the first iteration bigger, and the implementation plan less balanced. I'd say we can experiment with command integration when working on the first phase, but shouldn't commit to it. WDYT?
@gonzoyumo As discussed earlier today this issue is blocked until we have specs for what analyzer should log in the security reports they generate. Beyond that, we need to discuss what is the need for these log messages in the reports. cc @NicoleSchwartz
Marking the issue as blocked, and un-assigning myself.
As discussed in Today's CA weekly meeting, I'm pushing this to the %Backlog until we find real needs for it.
@kmann@andyvolpe do you envision any near time feature will need that information?
I think this could be useful to report warnings or error messages from the Scan to the user but so far we haven't seen any product requirement about that.
@gonzoyumo I don't see a need for this at the moment, at least not in the Vulnerability Management components. I haven't heard any requests to expose warning or error messages from the scanners in the vulnerability information. IMHO, if there is a problem with the scan itself, that should appear elsewhere.
I believe this field was introduced into the schema in anticipation of the need for third-party scanners to convey information about a scan that was not run on a GitLab Runner (i.e. when there is no job log). Detectify/DAST is an example of such a case.
I'm comfortable with the GitLab products not using these fields until there is a hard requirement.
@cam_swords Ah OK, that's helpful. In that case, this may be something we'll need to look at displaying in the vulnerability details (and perhaps other places). Don't see a problem with our own scanners not using it. I don't want to force information to be put into the schema by our scanners unless there is a compelling need for them to all do so.