Automatic fuzzing harness generation for coverage-guided fuzzing

Problem to solve

This is an effort to lower the barrier-to-entry of the new Coverage-Guided Fuzz Testing features that are being added to GitLab.

Creating targeted and efficient fuzzing harnesses requires a degree of experience with fuzzing that few developers have. Although the process is not complicated, the understanding of the mechanics of fuzzing is still required and adds to the learning curve.

We should be able to automatically derive coverage-guided fuzzing harnesses from existing code (example code, unit tests) for projects that have codebases in supported languages.

Intended users

Sasha (Software Developer)

User experience goal

Users should have the benefits of fuzzing, out-of-the-box, with minimal effort on their part. Ideally this will only require the inclusion of a CI template.

Proposal

I think there is an iterative path that can achieve this:

1. Stand-alone Fuzzing Harness Generation Tool

A stand-alone tool that can generate fuzzing harnesses from existing code would be a good MVC. It would:

Identify existing code that would work well with coverage-guided fuzzing
Create fuzzing harness(es) for the identified code

This tool could be open-sourced and maintained.

2. Automatic Fuzzing Harness Generation as part of GitLab

Once the stand-alone tool exists and is stable, the next step is to integrate it into GitLab. The first integration could be to:

Have a manual job included in a GitLab CI tempate that:
- automatically generates the fuzzing harnesses
- creates a new MR to the current project that
  - adds the fuzzing harnesses
  - adds new jobs to .gitlab-ci.yml

Having the results of the stand-alone tool added to the project via a merge request will give the user opportunity to edit and modify the generated harnesses.

3. Full automation

Full automation would be the end result. As a job in an includeable CI template, it would:

Identify code in the codebase that would work with coverage-guided fuzzing
Generate the fuzzing harness(es) and save them as build artifacts
Generate a new strategy: depends child pipeline that has jobs for the generated fuzzing harness(es)

The potential for false positives here is high. Robust filtering options would be needed to allow the user to include/exclude areas of code from the fuzzing-harness-generation process.

The benefit of this approach is that the fuzzing will evolve automatically as the project evolves.

Further details

Permissions and Security

Documentation

Availability & Testing

What does success look like, and how can we measure that?

What is the type of buyer?

What is the buyer persona for this feature? See https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/buyer-persona/

CISO
Director App-Dev
VP App-Dev

In which enterprise tier should this feature go? See https://about.gitlab.com/handbook/product/pricing/#four-tiers

Ultimate

Is this a cross-stage feature?

No, Secure only

Links / references

pytest-autoexplore

Edited Jul 29, 2020 by Jessica Johnson