Engineering discovery: Consider splitting Security Product analyzers between build and scan functionality
Problem to solve
Our many analyzers currently do more than just analyze users' code. Due to the necessity of building projects and/or pulling in dependencies in many cases, we must run such commands as yarn install
or mvn compile
to ensure all dependencies and targets exist prior to running a security scan. This is largely an automatic task yet in many cases a user wants to disable the build step or specify a special build configuration, see examples:
-
Python project dependency check failed
https://gitlab.com/gitlab-org/gitlab-ee/issues/6713 -
Retire.js analyzer needs node_modules directory
https://gitlab.com/gitlab-org/gitlab-ee/issues/9291
There are certain workarounds that have been supported, such as Leveraging existing build jobs for SAST, DS and Using $SETUP_CMD to bypass package manager auto-detection , but they are distinct and we are not providing a uniform experience to handle this.
We should explore isolating the build stage of our analyzers and providing a common interface for skipping or customizing it.
Intended users
Persona: Software developer Persona: DevOps Engineer
Further details
Pros
- Build step is made more explicit - users can more easily opt-out or sub their own build-step
- Build and Scan steps are documented by design - this is currently implicit, which is confusing and non-intuitive
-
Eliminate duplicate efforts across GL Stage groups - We are duplicating logic for auto-builds between ~"devops:secure" and
"devops:configure" ("auto devops") when we should not have crossed efforts. Builds are not necessarily a ~Secure responsibility - (Potentially) Functionality remains hidden behind scanners - no change to primary APIs but substantial flexibility.
Cons
- Breaking change to existing interfaces
Proposal
This issue is considered an Engineering discovery issue. No feature deliverable is expected but an actionable issue for the next iteration is the expected output.
What does success look like, and how can we measure that?
- Users have an explicit understanding of how our scanners both build and scan projects
- Users can easily override the build stage for their framework/language using the same method across all scanners
What is the type of buyer?
Results
We want to find a way to not depend on workspaces for now, as we don't know exactly when they're going to be released (targeting %12.5 12.8 as of today), and even then, we'd need to figure out a way to inject our tools.
So instead, we tried to use what we have on the shelf, to come up with a working solution asap.
While we already discussed this before, I think we missed a few pieces that were assembled today. The idea is to reverse the way we're running our analyzers: currently, the analyzer is built by GitLab, and the project "injected" during the job run into this container. While it's working in some or most cases, it brings the issues we know, especially when it comes to specific environments. So instead, we knew it would be better to inject our tool in the build instead, but couldn't find a clean way to do so.
This setup could solve everything, let's take the example of gemnasium-python
in dependency_scanning
. While this analyzer is currently called by dependency_scanning
, it will have soon its own job definition, when we'll get rid of the DinD requirement.
- The template (similar to the dependency_scanning template would be updated to something like (simplified example):
gemnasium-python:
[...]
script:
- [install the analyzer if not present]
- [run the analyzer as usual]
- If the user wants to have prerequisites, they can use in their
.gitlab-ci.yml
:
include:
template: Security/DepScan/Gemnasium-Python.gitlab-ci.yml
gemnasium-python:
before_script:
- apt-get install postgresql-dev
- Complex setup and scenarios can be imagined from there:
install_dep:
image: my_python_image
script:
- pip install --target=./pip -r requirements.txt
artifacts:
paths:
- ./pip
include:
template: Security/DepScan/Gemnasium-Python.gitlab-ci.yml
gemnasium-python:
image: my_python_image
dependencies: install_dep # to retrieve `./pip`
In this example, the analyzer will be installed in my_python_image
(remember the [install the analyzer if not present]
line of the new template script
).
The packages can even be transferred from a previous job through artifacts.
It's again possible to use before_script
to customize on top of my_python_image
.
This solution is:
- backward compatible (by default, we continue to provide the same analyzer images)
- working without any modification of the CI or the runner. We only rely on templates.
- flexible enough to support all types of projects
- simple to use
- can't work before we get rid of dind, unless we heavily modify SAST and DS, I don't think it's worth it.
- requiring analyzers with a minimal shell (won't work with
Scratch
images).
Also, it goes beyond the existing SETUP_CMD
we've introduced in some analyzers, while being way more flexible. Ultimately SETUP_CMD
should be deprecated.
Outstanding work
- Identify how to
[install the analyzer if not present]
. I suggest using eitherapt
orapk
for now, it should cover more than 90% of the cases. Windows is problematic, as usual.curl
might not be present (it's really unlikely) in user images, so we need something more portable. We can ship the packages with GitLab itself, but how do we update them? We'll need to create an issue to discuss this while promoting this issue to an epic (or create an epic from scratch). - Refactor the analyzers to make the build optional (we should detect if we really NEED to build)
- Target a real-world project
- Deprecate
SETUP_CMD