Skip to content

Design: How non-pipeline Secret Detection scans can work.

This work will succeed work on https://gitlab.com/gitlab-org/gitlab/-/issues/425993

Overview

This is what comes after pre-receive secret detection. it is early to be thinking about this but we wanted to get a head start on it because the user experience will significantly impact engineering decisions. We want to understand the ideal UX to help concretize some of the backend system design constraints we’d be under.

Things to consider

Enable/Disable the feature

  1. How and where users will be able to enable or disable the feature
  2. Need to deconflict with policies— we can use policies for this type of thing, which I’m not opposed to, but I find the policy model to be very restrictive (its authorization model is very strict, e.g., you have to be an Owner to attach a policy to a project)
  3. Thinking you probably have something like a checkbox that Maintainers can enable, and then the policy can force that checkbox to be checked

Rules & Customizations

We also should have some ideas about how rules and other customizations will be handled in the product. This would also carry over to SAST eventually. So it is important to get the concept right.

  1. We manage the default ruleset and keep it up-to-date
  2. We allow a limited set of customizations (disable rules or add custom patterns) that are done through a UI
  3. There is no longer a TOML file for configuring secret detection patterns
  4. There would be cascading defaults, starting at our hardcoded defaults, then allowing customization in order starting from:
    1. instance/organization level
    2. then group
    3. then project
    4. …if all of those are actually necessary. Important point: the type of secrets you have in an org does not vary per project.

Secret Detection results should lead to incidents

Big concept, too: secret detection results should lead to incidents. They are categorically different from vulns in code. Take GitLab for example: leaked creds go to SIRT for remediation, while CVEs and CWEs go to AppSec. Our product design does not contemplate or enable this distinction in any way.

So far we are planning to integrate Secret Detection results into the Vuln Report the same as is done today, but I have it in the back of my head that we should just bite the bullet and build our own data model that actually fits our problem domain, versus shoehorning secret findings into the Vuln Report and its associated assumptions about how vulns work.

The downside of building our own data model is that we have to take on the work to manage the data model, build APIs, and integrate with or replace Security Policies and any reporting (MR widget, pipeline report, vuln report, etc.)