Generic SAST engine

Problem to solve

Our current SAST offering is based on various and heterogeneous tools. While this approach was a great strategy to get up to speed quickly, it also has a lot of limitations. Most of them are due to the lack of a common code representation.

  • Lack of common code representation
    • we cannot formalize a set of general attack or vulnerability patterns that are re-usable across language boundaries. Conceptually an XSS follows the same pattern irrespective of the language that was used to develop its vulnerable host application.
    • we cannot learn new attack patterns or code smells across different languages.
    • we cannot release features such as incremental code scanning, code navigation, snippet matching because they require access to a code representation that contains AST, data-flow and control-flow information.
    • we cannot mine for certain patterns across different GitLab projects developed in different languages.
    • data-flow and control-flow are not always provided. The latter is particularly important for some of our categories, like RASP.
  • Engines are all written in different languages

Intended users

  • Delaney (Development Team Lead)
  • Sasha (Software Developer)
  • Devon (DevOps Engineer)
  • Sidney (Systems Administrator)
  • Sam (Security Analyst)

Further details

Some vulnerabilities are almost impossible to spot without a complete understanding of the code, and the control and data flow. SQL injections are a good example, they depend most of the time on a complete flow, that rules based on regular expressions are not able to cover.

Proposal

The vulnerability research team already evaluated a potential foundation in issue https://gitlab.com/gitlab-com/gl-security/appsec/vulnerabiltiy-research/issues/4, and developed a limited Ruby POC that extracts code information from the AST (functions, calls, arguments), augments that with data-flow and call information and stores that in a graph-database. The graph-database enables users to run code queries in order to answer questions such as: Is there a data flow from a to b.

Permissions and Security

TODO

Documentation

TODO

Testing

TODO

What does success look like, and how can we measure that?

  • Better results (lower FPR, higher TPR)
  • Contributions to our rules

What is the type of buyer?

GitLab Ultimate

Links / references

LLVM IR References/Resources

  • http://dev.stephendiehl.com/numpile/ - Python JIT with LLVM and LLVM IR
  • https://github.com/dabeaz/llvm-py/blob/master/www/src/userguide.txt#L315 - Dealing with modules in interpreted languages in LLVM IR (specifically Python)
  • https://us.pycon.org/2016/schedule/presentation/1995/ - Wrestling Python into LLVM IR
  • https://llvm.org/pubs/2004-Spring-AlexanderssonMSThesis.html - Ruby to LLVM
  • https://github.com/k0kubun/llrb/blob/master/README.md - Ruby -> LLVM IR -> LLVM Bitcode
  • https://go.googlesource.com/gollvm/ Go -> LLVM IR -> Backend
  • https://github.com/ShiftLeftSecurity/llvm2graphml -> LLVM to GraphML
Edited Feb 29, 2020 by Julian Thome
Assignee Loading
Time tracking Loading