Design: De-Duplication, Aggregation, 1:many, Linking, Grouping in Secure and Protect

MR submitted

https://about.gitlab.com/handbook/engineering/development/secure/glossary-of-terms/

gitlab-com/www-gitlab-com!98083 (merged)

Problem to solve

Currently, we have only a few blunt tools for handling data - de-duplication and skipping.

There are circumstances where these do not work well.

We should select a common set of vocabulary and definitions across Secure and Protect for ways we want to handle data (goals) document this, and then start working towards being able to accomplish these goals in areas we have room to improve / lack the functionality.

Proposal

HARD (meaning once done no undo)

pre-filter
- What is it? Where actions are taken before analysis occurs, such as skipping a directory or lock file.
- What does a user experience? This reduces noise (stuff users don't care about) and make the scans faster but also leaves no record. We should be clear this is discouraged if a user wants any record. .
- Do we do it today? we allow for some pre-filtering with default values (tests, spec, etc.), a user can override to NOT scan specific things using ci variables.
- Future? We should also log (audit trail event as well as job log) this was done (i.e. scan ran but X skipped A,B,C)
de-duplication
- What is it? Where two things are deemed the same, and they are combined or one is tossed.
- What does a user experience? they don't see this directly but benefit from less noise.
- Do we do it today? - Within a category yes we do this.
- Future? Cross-Category? Maybe? With Vendors? Maybe? Otherwise use Linking (possible preferable)

SOFT (meaning users can unwind/undo)

Grouping
- What is it? Where related items are linked together in some way.
- What does a user experience? less noise. in circumstances we believe items are probably related it would be ideal to be able to link them in a 1:many (many:many?) relationship so that a user only sees one by default but can choose to see the "related" items and break one off/out on it's own (un-link it) if they decide we were incorrect. by default only one should show, and any action taken on the one are taken on all. (no idea how technically but basically if you dismiss 1 all should dismiss for example) Rejected Terms: 1:Many / Aggregation / Grouping
- Do we do it today? No
- Future? &2652 (closed) or #267588 (closed) any others?
post-filter
- What is it? Things that are done to clean up after the scan
- What does a user experience? users may want to configure post-processing actions so they are aware of risks (can see/find them/historic) but aren't alerted/bothered/stopped by them. for example if I want to dismiss all items related to version 1.1 of Nokogiri because we made a fork, we record that the dependency is there, but act automatically on the finding such that it doesn't result in a required approval or even a "new finding" visibility.
- Do we do it today? No
- Future? TBD how technically we do this but it should be consistent across Secure and Protect.

Edited Feb 02, 2022 by Nicole Schwartz