Add typosquatting support for dependency scanning.

Problem to solve

Typosquatting is a technique in which a package/dependency is created by a bad actor that is very similar to an official package, but varies slightly by a typo.

Another variation is that a bad actor creates a seemingly official package name to trick engineers.

This recently occurred in python, when "jeIlyfish" (the first L is an I), was published in order to get included when engineers actually wanted the "jellyfish" library.

Also, "python3-dateutil," was published to trick engineers who actually wanted to include "dateutil". See https://www.zdnet.com/article/two-malicious-python-libraries-removed-from-pypi/ for more information.

The attack works because an engineer adds a misspelled package to their project or did an internet search and found what appeared to be the correct library and then incorporated it into a project. Because the corrupted dependency mirrors the behavior of the legitimate library, the engineer has no awareness that their project now has a malicious dependency.

This type of attack would bypass traditional dependency scanning and work because these libraries will not exist in a vulnerability database until they are discovered.

Intended users

[Sasha (Software Developer)](https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/#sasha-software-
Sam (Security Analyst)

Proposal

GitLab should create a risk score for libraries that do not match well known library names and evaluate whether they may be typosquatted, or just libraries that do not have known reported vulnerabilities.

Testing

Benign typosquatted library could be published to a local repository to test the detection ability.

Links / references

gitlab-org/security-products/gemnasium-db!734 (merged) gitlab-org/security-products/gemnasium-db!733 (merged) gitlab-org/security-products/gemnasium-db!969 (merged)

Edited Jan 06, 2020 by Julian Thome