What artifacts will be scanned for secrets in job artifacts?

Goal

  1. Determine a list of high value report and file types to scan for secrets in job artifacts. This includes things like environment variables, logs, IAC, and test files.
  2. The final deliverable will be a table that we can use for documentation. I could look something like this:
Report Type Supported File Names Supported File Types Size Limit

Please reconcile artifact report types with supported files that we want to scan for in secret detection for Job Artifacts.

  • If a file will appear in a AST vulnerability in the pipeline it doesn't need to be included, i.e. secret_detection artifact don't need to be scanned.
  • If a report type likely doesn't include secret information, we don't scan for secrets either, i.e. accessibility.

List of supported files for Experiment

Source code files

Most of artifacts are generated as an archive (.zip) file that can include source code files among other types of files. Therefore, it makes sense to support most popular programming languages\[^1\] since a repository is essentially composed of source code files.

Description File Extension Notes
Python .py, .pyx The .pyx file format is used in Cython (a variant of Python compiled to C). Other python related files include .pyd, .pyo, and .pyc, but those are compiled into bytecode, so those can be skipped since we don't plan to scan binary files at this point.
JavaScript .js, .jsx, .cjs, .mjs The .cjs and .mjs file formats for CommonJS module and ES module, respectively. The .jsx file format is often used for React-based files.
Java .java -
C .c, .h Header file .h is also considered a source file.
C++ .cpp, .hpp Header file .hpp is also considered a source file.
C# .cs -
PHP .php -
TypeScript .ts, .tsx The .tsx file format is similar to .jsx but for TypeScript.
Ruby .rb, .erb The .erb file format is used for Embedded Ruby.
Swift .swift -
Go .go -
Rust .rs -
Kotlin .kt, .kts The .kts file format is used for Kotlin Scripting.
Documentation files

These files are almost always text-based\[^2\], so they can be easily scanned without issues.

Description File Extension Notes
AsciiDoc .adoc, .asciidoc -
Text .txt, .rtf -
Markdown .markdown, .md, .mdx -
HTML .htm, .html, .xhtml -
UML Diagrams .uml -
API Blurprint .apib -
Changelog Files .changelog -
Dependency Documentation .deps -
Graphviz .dot, .gv -
Java Documentation Files .javadoc -
Javascript Documentation Files .jsdoc -
Mermaid Diagrams .mermaid, .mmd -
RESTful API Modeling Language .raml -
Ruby Documentation Files .rdoc -
reStructuredText Files .rst -
Security Scan Results

Most of the security scanning tools format their reports into one of the following file formats.

Description File Extension Notes
Static Analysis Results Interchange Format .sarif -
JSON .json -
XML .xml -
CSV .csv -
Test Reports

Test reports are usually generated by code coverage tools. Some other file formats are produced when tools run tests and output the results into separate files.

Description File Extension Notes
Code Coverage .coverage, .lcov, .gcov, .clover, .xml -
Test Results .junit, .junit.xml, .nunit, .xunit -
Log Files
Description File Extension Notes
Generic Log Files .log, .info, .debug -
Build Process Logs .build -
Standard Output Logs .stdout, .out -
Error Logs .stderr, .err -
Detailed Trace Logs .trace -
Database Migration Scripts

These kind of migration scripts or generated schema could be included in artifacts of a CI/CD job as well.

Description File Extension Notes
SQL .sql, .up.sql, .down.sql, .schema, .ddl, .pgsql, .psql, .mssql, .mysql, .mariadb -
Database Markup Language .dbml -
Static Assets

Static assets can be any type of text-based files compressed into a a job artifact. This exclude images, and other types of media.

Description File Extension Notes
Scalable Vector Graphics .svg Since .svg is essentially a XML file, it can be scanned.
Stylesheets .css, .scss, .sass, .less, .stylus, .css.map, .min.css -
Javascript Files .js.map, .min.js, .mjs, .cj -
Other Generic Files
Description File Extension Notes
Metadata Files .meta, .buildinfo, .pom -
Build Manifest Files .manifest -
Checksum Files .checksum -
Dependency Lock Files .lock, .lockfile -
Configuration Files .properties, .env, .toml, .yaml, .yml, .json, .ini, .conf, .config, .htaccess -
Ignore Files .gitignore, .dockerignore -
Version Files .version -
Signature and Verification Files .asc, .sig, .pub, .crt, .pem, .pgp, .sbom, .spdx -
GitLab Artifact Reports

In addition the list above, we should also scan the following text-based artifact reports:

Description File Extension Notes
sast .json Security Report. EE-only.
secret_detection .json Security Report. EE-only.
dependency_scanning .json Security Report. EE-only.
container_scanning json Security Report. EE-only.
cluster_image_scanning .json Security Report. EE-only.
dast .json Security Report. EE-only.
license_scanning .json License Scanning Report. EE-only.
accessibility .json Accessibility Report.
codequality .json Code Quality Report. EE-only.
performance .json Performance Report. EE-only until %13.2.
browser_performance .json Browser Performance Report. EE-only.
load_performance .json Load Performance Report. EE-only.
terraform .json Terraform/OpenTofu Plan File. EE-only.
requirements .json Project Requirements File. Deprecated-soon. EE-only.
requirements_v2 .json Project Requirements File.
coverage_fuzzing .json Security Report. EE-only.
api_fuzzing .json Security Report. EE-only.

Report Types

# Report Type Description Max file size Median Size (MB) Incude in scanning
1 accessibility Reports on the accessibility impact of changes introduced in merge requests. 100 MB No
2 annotations Attached to a job to add a link to the job output page. 100 MB No
3 api_fuzzing The api_fuzzing`  report collects I Fuzzing bugs](https://docs.gitlab.com/user/application_security/api_fuzzing/) as as artifacts. 100 MB No
4 archive 100 MB
5 browser_performance The browser_performance report collects Browser Performance Testing metrics as an artifact. This artifact is a JSON file output by the Sitespeed plugin. 100 MB No
6 cluster_image_scanning 100 MB
7 cobertura (coverage_report) View test coverage results in merge requests, line-by-line coverage in file diffs, and overall metrics. 100 MB Under consideration
8 code_quality | The codequality report collects ode quality issues](https://docs.gitlab.com/ci/testing/code_quality/). . | 100 MB
9 container_scanning Report collects Container Scanning vulnerabilities. 100 MB No
10 coverage_fuzzing Report collects coverage fuzzing bugs. 100 MB No
11 cyclonedx This report is a Software Bill of Materials describing the components of a project following the CycloneDX](https://cyclonedx.org/docs/1.4)  protocol format 5 MB No
12 dast The dast report collects DAST vulnerabilities. 100 MB No
13 dependency_scanning The dependency_scanning report collects Dependency Scanning vulnerabilities. 100 MB No
14 dotenv The dotenv report collects a set of environment variables as artifacts. 100 MB Yes
15 jacoco 100 MB
16 junit The junit report collects JUnit report format XML files. This is a collection of unit test reports. 100 MB Yes
17 license_scanning 100 MB
18 load_performance The load_performance report collects Load Performance Testing metrics. 100 MB No
19 lsif 200 MB
20 metadata 100 MB
21 metrics You can configure your job to use custom Metrics Reports, and GitLab displays a report on the merge request so that it’s easier and faster to identify changes without having to check the entire log. 100 MB
22 metrics_referee 100 MB
23 performance 100 MB
24 repository_xray (deprecated) The repository_xray report collects information about your repository for use by GitLab Duo Code Suggestions. 100 MB No
25 requirements 100 MB
26 requirements_v2 100 MB
27 sast The sast report collects SAST vulnerabilities. 100 MB No
28 secret_detection The secret-detection report collects detected secrets. 100 MB Yes
29 terraform The terraform report obtains an OpenTofu tfplan.json file. 5 MB Yes
30 trace 100 MB
  1. archive - shows up when a job uploads at least one artifact
  2. metadata - shows up when a job uploads at least one artifact. metadata has information about the entries in the artifact archive
  3. trace - always shows up for every job, with some delay
Edited by Alana Bellucci