Address analysers generating inconsistent vulnerability identifiers

We observed during a spike issue discussion for the vulnerability grouping feature development that analysers are generating inconsistent identifier mappings deviating from the documented identifier format and this inconsistency hinders feature development on top of the data present in vulnerability_identifiers table.

Known scenarios where the inconsistency causes problem:

Anticipated scenarios in future where this can cause problem:

For more details: see discussion - #423557 (comment 1541778476)

(This focused on OWASP identifiers but priority should be considered for all types)

backend Audit existing analyzers to ensure they are generating correctly formed OWASP identifiers¹
backend Create background migration for normalizing existing data (where(external_id: ["A8"]).update(external_id: "A8:2017))
backend Add strict validation to common identifier types via schema (breaking change) and report module for generating common identifiers

Some work will be needed to ensure A8 is mapped to the correct year (or default to latest which is easy but could lead to inconsistencies) but we might be able to assume non-padded entries are 2017 vs 2021 or the up-and-coming 2024 ↩

Edited Apr 29, 2024 by Lucas Charles