Support project_fingerprint vulnerability updates

Problem to solve

We currently rely on a stable project_fingerprint in order to match vulnerabilities against any feedback generated for them; i.e. a dismissal, an issue created from a vulnerability, etc. Since we rely on this calculated field to track occurrences as they move around within the codebase, it becomes important to ensure a consistent calculation or we will be unable to recognize that an occurrence is the same one when it has moved. This becomes a problem when we update our calculation within a scanner as it ends up disassociating all existing feedback, leading to regressions and the appearance of data loss.

We need a way to proactively update fingerprints when they have changed.

Why do we need to update project_fingerprints? Sometimes we rely an an overly simplistic calculation, such as when an analyzer does not provide adequate line numbers and we must rely on a hash of the affected code. As we move to better fingerprinting we occasionally must change this calculation

To quote from @vzagorodny

To recover the links between occurrences created before the regression was introduced and feedbacks for them created after that moment, we can update all feedbacks created after the regression new project_fingerprint upon retrieval, discussions here: https://gitlab.com/gitlab-org/gitlab-ee/issues/10561#note_193069748

In general, I think we need to discuss the approach to retroactively updating project_fingerprint in order to have this stabilized; we are getting into similar problem from time to time.

Intended users

Persona: Development Team Lead

Further details

Proposal

One approach we could consider is introducing a project_fingerprint_old field. This field would be either null or contain the previous fingerprint calculation. When present, our parser should split the update_or_create behavior into a separate lookup by the old fingerprint, allowing us to smoothly update feedback in place and migrate fingerprints in the process of parsing.

This would mean we have to keep the previous calculation around within our analyzers for a release or two and it means we need to consider some kind of deprecation timeframe. What if we need to update fingerprints twice within the same release?

Permissions and Security

No change to permissions

Documentation

No change to documentation, this would be entirely a change to the backend parsing logic and updates to any relevant analyzers.

Testing

We should test both parsing paths: regression testing standard update behavior of project_fingerprint and testing the migration behavior when project_fingerprint_old is present

What does success look like, and how can we measure that?

We can update fingerprints inline

What is the type of buyer?

GitLab Ultimate

Links / references

Edited Aug 12, 2019 by Lucas Charles