Transition Plan - James Liu
What
This issue outlines the issues and domain knowledge that should be handed over as I transition out of groupstatic analysis
Responsibilities
Responsibility |
Backup |
Notes |
|
||
- |
|
|
- |
|
|
Technical Discovery: Streamlining secret revocation vendor integrations |
- |
|
Technical Discovery: E2E testing for Static Analysis analysers |
- |
|
- |
|
|
Domain Knowledge
MobSF
- https://gitlab.com/gitlab-org/security-products/analyzers/mobsf
- One of the tricker analysers because we do a lot of upfront work to massage subdirectories of the target project to be compatible with the directory structure expected by MobSF.
- Most analysers implement a Match function which gets called by the shared command package to determine if a target repo can be scanned. In MobSF we do this only as a first pass; we then perform more involved matching in the createScanJobs function.
- This is because you can have multiple Android modules in a single repo, or a mix of Android and iOS projects or even binaries. Everything must be accommodated for and scanned separately because MobSF is a snowflake.
- Make sure you read and understand the entire flow of the analyze function.
- In the MobSF container, we spawn a separate Python server which hosts the MobSF scanning tool. All interactions with the scanner happen over HTTP requests within the same container.
- Because of how we’ve configured the Docker entrypoint, MobSF doesn’t work correctly in k8s unless this workaround is applied: gitlab-org/gitlab#330680 (closed)
- Consider moving to mobsfscan which is Semgrep-based and maintained by the same author. The same author also maintains nodejs-scan.
Secrets
- https://gitlab.com/gitlab-org/security-products/analyzers/secrets
- Fairly straightforward – just think of it as regex matches over strings with a tiny bit of cleverness to improve performance.
- Two modes of operation (broadly):
- Without git – the repo is treated as a bag of strings and scanned completely
- With git – uses the git command line to traverse commits to scan individual patches instead of the whole repo.
- The mode of operation can be overridden manually using CI variables. It’s all computed in the gitFetch function.
- Some rules have word boundaries (i.e. \b) but most of them do not for some reason. This can result in false positives if we match a substring.
- The Oculus/Meta/Instagram tokens are particularly prone to false positives because the patterns are so broad, and the prefix is so short. This is with word boundaries in place.
- Secret detection reports differ from SAST reports in that findings may also include a commit SHA.
- Incorrect handling of this SHA has resulted in bugs like gitlab-org/gitlab#358073 (closed)
How the commit SHA works in SD reports
- All SD reports contain a `location.commit.sha` field. The value can either be a valid commit SHA or a placeholder SHA (0000000).
- Placeholder SHA: https://gitlab.com/gitlab-org/security-products/analyzers/secrets/-/blob/baf67714b48b45fccafd01f83f87f8c1a21b2cd1/qa/expect/secrets/gl-secret-detection-report.json#L19-21
- Real SHA (note the additional commit metadata too): https://gitlab.com/gitlab-org/security-products/analyzers/secrets/-/blob/baf67714b48b45fccafd01f83f87f8c1a21b2cd1/qa/expect/secrets-commits/gl-secret-detection-report.json#L19-24
- Whether the report will contain the placeholder or a real SHA depends on the kind of scan being performed. The types are listed in the documentation here.
- The “default branch” scan treats the project as a simple directory of files, so no commit data will be available. This results in the placeholder 0000000 SHA being included for all findings.
- All other scan types perform scanning by iterating through the Git commit history (or range) and performing regex matching on the patches associated with each commit. These scans include a valid SHA.
- When it’s time to render the vulnerability report, Rails needs to compute the correct permalink to the source line where the token was leaked.
- For example: https://gitlab.com/gitlab-org/security-products/tests/secrets/-/security/vulnerabilities/71933494
- The file link under the Location heading points to https://gitlab.com/gitlab-org/security-products/tests/secrets/-/blob/84d4f7dc402eae8d36a9023a0b5ece38ce063e38/keys/testkeys-110044-23ecb974342473e425f395ddd5f8f787ac6be90d.json#L5, which is the permalink including a SHA.
- Review the following MRs to understand how Rails computes the blob path:
- Basically, Rails will look for the absence of the `location.commit.sha` property or the presence of the placeholder SHA (0000000) to decide if the SHA is valid.
- If valid, use the `location.commit.sha` in the blob path.
- If invalid, use the SHA of the latest commit on the default branch in the blob path.
- This makes an assumption that a “default branch” scan was executed.
- We never not use a SHA in the blob path, as the source could easily become outdated.
- For example: https://gitlab.com/gitlab-org/security-products/tests/secrets/-/security/vulnerabilities/71933494