Show static status (enabled/ disabled) on configuration page

Problem to solve

On the Security & Compliance -> Configuration page, it seems like the the scanners/ security controls are enabled once the gitlab-ci.yml file has been added to the project, and would stay that way until/ unless it is removed. However, this enabled/ not enabled status is dependent upon the latest pipeline run on the default branch, and the enabled status can switch to not enabled if the pipeline fails.

While this is mentioned in the description text at the top of the page (1), it was surprising to many of us, and thus might be confusing to users as well.

(1)

The status of the table below only applies to the default branch and is based on the latest pipeline. Once you've enabled a scan for the default branch, any subsequent feature branch you create will include the scan.

Intended users

User experience goal

Reduce confusion as to whether a scanner/ security control has been enabled (or disabled), i.e. if the gitlab-ci.yml file has been added to the project or not
Display the results of the latest pipeline run on this page

Proposal

The Enabled status on the Configuration page should be static if the .yml file is present, or ADO is enabled, and should not tied to the latest pipeline run on the default branch.

Permissions and Security

Gold/ Ultimate customers for now, but possibly core too after we implement design result in #241377 (closed)

Documentation

TBD

What does success look like, and how can we measure that?

TBD

Links / references

Comments from the Static Analysis engineers in Slack:

I think we don't know that it has been correctly enabled until it runs (via the pipeline). It may be configured to run, but not properly. Or put another way, I don't know how we would determine that it has been enabled (and correctly) otherwise). The fact that it is so dynamic is definitely a double-edged sword.

We decide whether SAST is enabled/disabled based on whether sast report (gl-sast-report.json) has been created by the latest pipeline. May be this is the simplest way to figure out the status of analyzer. I am sure of why the configuration page has been implemented in this way.

I also don’t know why it’s implemented this way, however I think it was implemented this way because it’s really hard for us to know if our features are actually enabled in a project. We’ve enabled them to be turned on in so many different ways that it’s easier to know if they’ve reported results than if it was turned on somewhere/anywhere. I hate myself for saying this, but that’s not going to stop me… This is not a problem unique to us - it’s a company-wide problem, even in our analytics. We’re using usage (results) as a proxy for turned on/configured. So the converse also holds true - we don’t know if/when folks turned off our features, which would explain the reliance on the last pipeline run.

Yes, my understanding is that inspecting the last pipeline is significantly cheaper than parsing CI files to figure out if the job is enabled or not. I think maybe @theoretick was involved in a discussion about this before? Note also that it's not only when a pipeline failed that you might see all the Security jobs as Not enabled; we sometimes run partial pipelines when subsets of files change. For instance, if only docs are changed, only docs jobs are run, meaning that the last pipeline doesn't have anything to say about Security jobs. This has come up repeatedly, but I can't for the life of me find a previous discussion about this or issue, although I know both exist

Yep, it's expensive

Partial pipelines are a real problem Just because there's a valid CI file with a SAST job it doesn't mean it runs. If it's completely valid, that doesn't mean our scanner worked (we could fail to build the application, we may not support that language, the rules can vary wildly on whether the job will ever actually execute)... and what if the job fails, is SAST still "enabled?" So that leaves us with "look for a recent gl-sast-report.json in a pipeline". What's recent? If they merge code once a month, is a month old one point to security scans being "enabled"?

Attempting not to open a whole side discussion: This makes me really think we should consider a “security scans history page” that might just be a filtered CI jobs page with some extra metadata. We’ve been trying to cram last run details on the security dashboard but there are many other things this would enable like simple audit/compliance runs as well as clearer status and potentially more straightforward telemetry.

Since we have a scan model now, we can pull from those directly rather than CI builds which is a much more expensive lookup too

Edited Oct 29, 2020 by Becka Lippert