Simplify Splitting SAST Jobs by Language

Description

To improve the usability and performance of gitlab-advanced-sast scans, provide a mechanism to automatically split jobs by language. This feature would address customer challenges in manually splitting scans for large monorepos and improve scan times.

Use Case

The customer is using gitlab-advanced-sast to scan a large monorepo containing multiple languages, including Python, with approximately 8,000 source files and growing.

  1. Initial Challenge:
    • The total runtime for their SAST scan was 95 minutes.
    • They manually split scans by language, reducing the runtime for Python files to 55 minutes, which is still too long for their workflow.
  2. Current Pain Points:
    • Splitting scans requires manual configuration, which is error-prone and complex.
    • The existing SAST_EXCLUDED_PATHS variable uses glob patterns, making granular exclusions challenging.
    • The customer cannot further divide Python scans efficiently due to limitations in file exclusion patterns.
  3. Ideal Solution:
    • An automatic or simple flag-based method to split jobs by language.
    • Improved exclusion pattern support to enable finer-grained control over scans.

Tasks

  1. Investigate feasibility of implementing default behavior or a configuration flag for splitting jobs by language.
  2. Develop the feature to either:
    • Automatically detect and split jobs based on languages present in the repository.
    • Introduce a flag (e.g., SAST_SPLIT_BY_LANGUAGE) to enable this functionality.
  3. Ensure compatibility with existing variables like SAST_EXCLUDED_PATHS.
  4. Test the feature with repositories containing multiple languages and large monorepos.

Acceptance Criteria

  • Jobs are automatically or easily split by language.
  • Configuration options are documented with clear examples.
  • Feature improves scan performance and user experience for multi-language repositories.
  • No breaking changes to current configurations.

Edited by Christian Nnachi