Generic SAST engine

Problem to solve

Our current SAST offering is based on various and heterogeneous tools. While this approach was a great strategy to get up to speed quickly, it also has a lot of limitations. Most of them are due to the lack of a common code representation.

Lack of common code representation
- we cannot formalize a set of general attack or vulnerability patterns that are re-usable across language boundaries. Conceptually an XSS follows the same pattern irrespective of the language that was used to develop its vulnerable host application.
- we cannot learn new attack patterns or code smells across different languages.
- we cannot release features such as incremental code scanning, code navigation, snippet matching because they require access to a code representation that contains AST, data-flow and control-flow information.
- we cannot mine for certain patterns across different GitLab projects developed in different languages.
- data-flow and control-flow are not always provided. The latter is particularly important for some of our categories, like RASP.
Engines are all written in different languages

Intended users

Further details

Some vulnerabilities are almost impossible to spot without a complete understanding of the code, and the control and data flow. SQL injections are a good example, they depend most of the time on a complete flow, that rules based on regular expressions are not able to cover.

Proposal

The vulnerability research team already evaluated a potential foundation in issue https://gitlab.com/gitlab-com/gl-security/appsec/vulnerabiltiy-research/issues/4, and developed a limited Ruby POC that extracts code information from the AST (functions, calls, arguments), augments that with data-flow and call information and stores that in a graph-database. The graph-database enables users to run code queries in order to answer questions such as: Is there a data flow from a to b.

Permissions and Security

TODO

Documentation

TODO

Testing

TODO

What does success look like, and how can we measure that?

Better results (lower FPR, higher TPR)
Contributions to our rules

What is the type of buyer?

GitLab Ultimate

Links / references

LLVM IR References/Resources

http://dev.stephendiehl.com/numpile/ - Python JIT with LLVM and LLVM IR
https://github.com/dabeaz/llvm-py/blob/master/www/src/userguide.txt#L315 - Dealing with modules in interpreted languages in LLVM IR (specifically Python)
https://us.pycon.org/2016/schedule/presentation/1995/ - Wrestling Python into LLVM IR
https://llvm.org/pubs/2004-Spring-AlexanderssonMSThesis.html - Ruby to LLVM
https://github.com/k0kubun/llrb/blob/master/README.md - Ruby -> LLVM IR -> LLVM Bitcode
https://go.googlesource.com/gollvm/ Go -> LLVM IR -> Backend
https://github.com/ShiftLeftSecurity/llvm2graphml -> LLVM to GraphML

Edited Feb 29, 2020 by Julian Thome