Engineering discovery: Consider splitting Security Product analyzers between build and scan functionality

Problem to solve

Our many analyzers currently do more than just analyze users' code. Due to the necessity of building projects and/or pulling in dependencies in many cases, we must run such commands as yarn install or mvn compile to ensure all dependencies and targets exist prior to running a security scan. This is largely an automatic task yet in many cases a user wants to disable the build step or specify a special build configuration, see examples:

Python project dependency check failed https://gitlab.com/gitlab-org/gitlab-ee/issues/6713
Retire.js analyzer needs node_modules directory https://gitlab.com/gitlab-org/gitlab-ee/issues/9291

There are certain workarounds that have been supported, such as Leveraging existing build jobs for SAST, DS and Using $SETUP_CMD to bypass package manager auto-detection , but they are distinct and we are not providing a uniform experience to handle this.

We should explore isolating the build stage of our analyzers and providing a common interface for skipping or customizing it.

Intended users

Persona: Software developer Persona: DevOps Engineer

Further details

Pros

Build step is made more explicit - users can more easily opt-out or sub their own build-step
Build and Scan steps are documented by design - this is currently implicit, which is confusing and non-intuitive
Eliminate duplicate efforts across GL Stage groups - We are duplicating logic for auto-builds between ~"devops:secure" and ~~"devops:configure" (~~"auto devops") when we should not have crossed efforts. Builds are not necessarily a ~Secure responsibility
(Potentially) Functionality remains hidden behind scanners - no change to primary APIs but substantial flexibility.

Cons

Breaking change to existing interfaces

Proposal

This issue is considered an Engineering discovery issue. No feature deliverable is expected but an actionable issue for the next iteration is the expected output.

What does success look like, and how can we measure that?

Users have an explicit understanding of how our scanners both build and scan projects
Users can easily override the build stage for their framework/language using the same method across all scanners

What is the type of buyer?

GitLab Ultimate

Results

We want to find a way to not depend on workspaces for now, as we don't know exactly when they're going to be released (targeting ~~%12.5~~ 12.8 as of today), and even then, we'd need to figure out a way to inject our tools. So instead, we tried to use what we have on the shelf, to come up with a working solution asap.

While we already discussed this before, I think we missed a few pieces that were assembled today. The idea is to reverse the way we're running our analyzers: currently, the analyzer is built by GitLab, and the project "injected" during the job run into this container. While it's working in some or most cases, it brings the issues we know, especially when it comes to specific environments. So instead, we knew it would be better to inject our tool in the build instead, but couldn't find a clean way to do so.

This setup could solve everything, let's take the example of gemnasium-python in dependency_scanning. While this analyzer is currently called by dependency_scanning, it will have soon its own job definition, when we'll get rid of the DinD requirement.

The template (similar to the dependency_scanning template would be updated to something like (simplified example):

gemnasium-python:
  [...]
  script:
    - [install the analyzer if not present]
    - [run the analyzer as usual]

If the user wants to have prerequisites, they can use in their .gitlab-ci.yml:

include:
  template: Security/DepScan/Gemnasium-Python.gitlab-ci.yml

gemnasium-python:
  before_script:
    - apt-get install postgresql-dev

Complex setup and scenarios can be imagined from there:

install_dep:
  image: my_python_image
  script:
    - pip install --target=./pip -r requirements.txt
  artifacts:
    paths:
      - ./pip

include:
  template: Security/DepScan/Gemnasium-Python.gitlab-ci.yml

gemnasium-python:
  image: my_python_image
  dependencies: install_dep # to retrieve `./pip`

In this example, the analyzer will be installed in my_python_image (remember the [install the analyzer if not present] line of the new template script).

The packages can even be transferred from a previous job through artifacts.

It's again possible to use before_script to customize on top of my_python_image.

This solution is:

backward compatible (by default, we continue to provide the same analyzer images)
working without any modification of the CI or the runner. We only rely on templates.
flexible enough to support all types of projects
simple to use
can't work before we get rid of dind, unless we heavily modify SAST and DS, I don't think it's worth it.
requiring analyzers with a minimal shell (won't work with Scratch images).

Also, it goes beyond the existing SETUP_CMD we've introduced in some analyzers, while being way more flexible. Ultimately SETUP_CMD should be deprecated.

Outstanding work

Identify how to [install the analyzer if not present]. I suggest using either apt or apk for now, it should cover more than 90% of the cases. Windows is problematic, as usual. curl might not be present (it's really unlikely) in user images, so we need something more portable. We can ship the packages with GitLab itself, but how do we update them? We'll need to create an issue to discuss this while promoting this issue to an epic (or create an epic from scratch).
Refactor the analyzers to make the build optional (we should detect if we really NEED to build)
Target a real-world project
Deprecate SETUP_CMD

Edited Apr 21, 2023 by 🤖 GitLab Bot 🤖