Address analysers generating inconsistent vulnerability identifiers

We observed during a spike issue discussion for the vulnerability grouping feature development that analysers are generating inconsistent identifier mappings deviating from the documented identifier format and this inconsistency hinders feature development on top of the data present in vulnerability_identifiers table.

Known scenarios where the inconsistency causes problem:

  1. OWASP top 10 vulnerability grouping feature as part of Vulnerability report grouping (&10164)
  2. Identifier filtering as part of Enhanced filtering and search on the Vulnerabil... (&3429).

Anticipated scenarios in future where this can cause problem:

  1. Enhanced filtering and search on the Vulnerabil... (&3429)

For more details: see discussion - #423557 (comment 1541778476)

High-Level Implementation Plan

(This focused on OWASP identifiers but priority should be considered for all types)

  • backend Audit existing analyzers to ensure they are generating correctly formed OWASP identifiers1
  • backend Create background migration for normalizing existing data (where(external_id: ["A8"]).update(external_id: "A8:2017))
  • backend Add strict validation to common identifier types via schema (breaking change) and report module for generating common identifiers
  1. Some work will be needed to ensure A8 is mapped to the correct year (or default to latest which is easy but could lead to inconsistencies) but we might be able to assume non-padded entries are 2017 vs 2021 or the up-and-coming 2024

Edited by Lucas Charles