Next Generation SAST engine
### Problem to solve
Our current SAST offering is based on [various and heterogeneous tools](https://gitlab.com/gitlab-org/security-products/analyzers). While this approach was a great strategy to get up to speed quickly, it also has a lot of limitations. Most of them are due to the lack of a common code representation.
- Lack of common code representation
- we cannot formalize a set of general attack or vulnerability patterns that are re-usable across language boundaries. Conceptually an XSS follows the same pattern irrespective of the language that was used to develop its vulnerable host application.
- we cannot learn new attack patterns or code smells across different languages.
- we cannot release features such as incremental code scanning, code navigation, snippet matching because they require access to a code representation that contains AST, data-flow and control-flow information.
- we cannot mine for certain patterns across different GitLab projects developed in different languages.
- data-flow and control-flow are not always provided. The latter is particularly important for some of our categories, like [RASP](https://about.gitlab.com/direction/defend/rasp/).
- Engines are all written in different languages
### Intended users
* [Delaney (Development Team Lead)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#delaney-development-team-lead)
* [Sasha (Software Developer)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#sasha-software-developer)
* [Devon (DevOps Engineer)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#devon-devops-engineer)
* [Sidney (Systems Administrator)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#sidney-systems-administrator)
* [Sam (Security Analyst)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#sam-security-analyst)
### Further details
<!-- Include use cases, benefits, and/or goals (contributes to our vision?) -->
Some vulnerabilities are almost impossible to spot without a complete understanding of the code, and the control and data flow.
SQL injections are a good example, they depend most of the time on a complete flow, that rules based on regular expressions are not able to cover.
### Proposal
<!-- How are we going to solve the problem? Try to include the user journey! https://about.gitlab.com/handbook/journeys/#user-journey -->
The vulnerability research team already evaluated a potential foundation in issue https://gitlab.com/gitlab-com/gl-security/appsec/vulnerabiltiy-research/issues/4, and developed a limited [Ruby POC](https://gitlab.com/gitlab-org/gitlab/issues/35380) that extracts code information from the AST (functions, calls, arguments), augments that with data-flow and call information and stores that in a graph-database. The graph-database enables users to run code queries in order to answer questions such as: `Is there a data flow from a to b`.
### Permissions and Security
<!-- What permissions are required to perform the described actions? Are they consistent with the existing permissions as documented for users, groups, and projects as appropriate? Is the proposed behavior consistent between the UI, API, and other access methods (e.g. email replies)?-->
TODO
### Documentation
<!-- See the Feature Change Documentation Workflow https://docs.gitlab.com/ee/development/documentation/feature-change-workflow.html
Add all known Documentation Requirements here, per https://docs.gitlab.com/ee/development/documentation/feature-change-workflow.html#documentation-requirements
If this feature requires changing permissions, this document https://docs.gitlab.com/ee/user/permissions.html must be updated accordingly. -->
TODO
### Testing
<!-- What risks does this change pose? How might it affect the quality of the product? What additional test coverage or changes to tests will be needed? Will it require cross-browser testing? See the test engineering process for further help: https://about.gitlab.com/handbook/engineering/quality/test-engineering/ -->
TODO
### What does success look like, and how can we measure that?
<!-- Define both the success metrics and acceptance criteria. Note that success metrics indicate the desired business outcomes, while acceptance criteria indicate when the solution is working correctly. If there is no way to measure success, link to an issue that will implement a way to measure this. -->
- Better results (lower FPR, higher TPR)
- Contributions to our rules
### What is the type of buyer?
<!-- Which leads to: in which enterprise tier should this feature go? See https://about.gitlab.com/handbook/product/pricing/#four-tiers -->
gitlab~3207279
### Links / references
#### LLVM IR References/Resources
* http://dev.stephendiehl.com/numpile/ - Python JIT with LLVM and LLVM IR
* https://github.com/dabeaz/llvm-py/blob/master/www/src/userguide.txt#L315 - Dealing with modules in interpreted languages in LLVM IR (specifically Python)
* https://us.pycon.org/2016/schedule/presentation/1995/ - Wrestling Python into LLVM IR
* https://llvm.org/pubs/2004-Spring-AlexanderssonMSThesis.html - Ruby to LLVM
* https://github.com/k0kubun/llrb/blob/master/README.md - Ruby -> LLVM IR -> LLVM Bitcode
* https://go.googlesource.com/gollvm/ Go -> LLVM IR -> Backend
* https://github.com/ShiftLeftSecurity/llvm2graphml -> LLVM to GraphML
epic