Transition Plan - James Liu

What

This issue outlines the issues and domain knowledge that should be handed over as I transition out of groupstatic analysis

Responsibilities

Responsibility	Backup	Notes
Secret Detection false positive testing	@vbhat161	Vishwa and I have been working very closely on this initiative, so we should both be more or less on the same page. The implementation is mostly done and ready for regular use. Start a discussion around future enhancements to the FP testing, as well as what to do with Yardstick.
Investigate high-FP rate Secret Detection rules	-	This list was compiled from a test run of the FP testing tool above. There are a few patterns with a very high FP rate that we should look at improving soon.
Secret detection continuous scanning	-	Planning on the general direction we want to take for an MVC in addition to a PoC scanning engine have been completed so far. Lucas has also created a draft MR that hooks into the PostReceiveService. Next steps are outlined here: gitlab-org/gitlab#413274 (comment 1498129993)
Technical Discovery: Streamlining secret revocation vendor integrations	-	We should work with the SecAuto team to facilitate handover of the SRS repo and associated infrastructure. This will make it easier for Static Analysis to implement additional vendor integrations. Add notes to the issue based on your previous experience with adding Postman, GCP. Done: https://gitlab.com/gitlab-org/gitlab/-/issues/395523#note_1520162913
Technical Discovery: E2E testing for Static Analysis analysers	-	Many of our problems stem from report processing and rendering in the vulnerability management UI. End-to-end testing may catch some of these bugs before customers are impacted. Such as gitlab-org/gitlab#358073 (closed) gitlab-org/gitlab#408944 (comment 1471377199)
Add automatic response for leaked Segment API tokens	-	This could be a candidate for the next auto revocation integration.
Improve the Semgrep rule-testing CI config

Domain Knowledge

MobSF

https://gitlab.com/gitlab-org/security-products/analyzers/mobsf
One of the tricker analysers because we do a lot of upfront work to massage subdirectories of the target project to be compatible with the directory structure expected by MobSF.
Most analysers implement a Match function which gets called by the shared command package to determine if a target repo can be scanned. In MobSF we do this only as a first pass; we then perform more involved matching in the createScanJobs function.
- This is because you can have multiple Android modules in a single repo, or a mix of Android and iOS projects or even binaries. Everything must be accommodated for and scanned separately because MobSF is a snowflake.
Make sure you read and understand the entire flow of the analyze function.
In the MobSF container, we spawn a separate Python server which hosts the MobSF scanning tool. All interactions with the scanner happen over HTTP requests within the same container.
- Because of how we’ve configured the Docker entrypoint, MobSF doesn’t work correctly in k8s unless this workaround is applied: gitlab-org/gitlab#330680 (closed)
Consider moving to mobsfscan which is Semgrep-based and maintained by the same author. The same author also maintains nodejs-scan.

Secrets

https://gitlab.com/gitlab-org/security-products/analyzers/secrets
Fairly straightforward – just think of it as regex matches over strings with a tiny bit of cleverness to improve performance.
Two modes of operation (broadly):
- Without git – the repo is treated as a bag of strings and scanned completely
- With git – uses the git command line to traverse commits to scan individual patches instead of the whole repo.
The mode of operation can be overridden manually using CI variables. It’s all computed in the gitFetch function.
Some rules have word boundaries (i.e. \b) but most of them do not for some reason. This can result in false positives if we match a substring.
The Oculus/Meta/Instagram tokens are particularly prone to false positives because the patterns are so broad, and the prefix is so short. This is with word boundaries in place.
Secret detection reports differ from SAST reports in that findings may also include a commit SHA.
- Incorrect handling of this SHA has resulted in bugs like gitlab-org/gitlab#358073 (closed)

How the commit SHA works in SD reports

All SD reports contain a `location.commit.sha` field. The value can either be a valid commit SHA or a placeholder SHA (0000000).
- Placeholder SHA: https://gitlab.com/gitlab-org/security-products/analyzers/secrets/-/blob/baf67714b48b45fccafd01f83f87f8c1a21b2cd1/qa/expect/secrets/gl-secret-detection-report.json#L19-21
- Real SHA (note the additional commit metadata too): https://gitlab.com/gitlab-org/security-products/analyzers/secrets/-/blob/baf67714b48b45fccafd01f83f87f8c1a21b2cd1/qa/expect/secrets-commits/gl-secret-detection-report.json#L19-24
Whether the report will contain the placeholder or a real SHA depends on the kind of scan being performed. The types are listed in the documentation here.
- The “default branch” scan treats the project as a simple directory of files, so no commit data will be available. This results in the placeholder 0000000 SHA being included for all findings.
- All other scan types perform scanning by iterating through the Git commit history (or range) and performing regex matching on the patches associated with each commit. These scans include a valid SHA.
When it’s time to render the vulnerability report, Rails needs to compute the correct permalink to the source line where the token was leaked.
- For example: https://gitlab.com/gitlab-org/security-products/tests/secrets/-/security/vulnerabilities/71933494
  - The file link under the Location heading points to https://gitlab.com/gitlab-org/security-products/tests/secrets/-/blob/84d4f7dc402eae8d36a9023a0b5ece38ce063e38/keys/testkeys-110044-23ecb974342473e425f395ddd5f8f787ac6be90d.json#L5, which is the permalink including a SHA.
- Review the following MRs to understand how Rails computes the blob path:
- Basically, Rails will look for the absence of the `location.commit.sha` property or the presence of the placeholder SHA (0000000) to decide if the SHA is valid.
  - If valid, use the `location.commit.sha` in the blob path.
  - If invalid, use the SHA of the latest commit on the default branch in the blob path.
    - This makes an assumption that a “default branch” scan was executed.
  - We never not use a SHA in the blob path, as the source could easily become outdated.