During discussion on #205489 (closed) it was mentioned we could reuse uuid column present on Vulnerabilities::Finding model to store UUIDv5 which would be stable enough for us to use when matching Vulnerabilities::Feedback with Vulnerabilities::Finding models.
backend Change the StoreReport service to save findings with their UUID V5 values. Calculate them using report_type + <fingerprint of the primary_identifier> + <fingerprint of the location> + project_id
database Remove the default value from UUID column.
backend Adjust Security::StoreReportService to look up findings using UUIDv5 (#292236 (closed))
While discussing on #205489 (closed) we've decided that we can have an attribute for findings called UUID which holds the UUID V5 data to identify the findings. The existing column is holding UUID V4 but the semantics are the same for V5 so we can utilize the same column and calculate the new UUID values in ee/lib/gitlab/ci/parsers/security/common.rb or somewhere else but we can also drop it and re-introduce with a different name. I just want to hear your thoughts on this.
(dropping a column will take at least 3 milestones)
I think this is an important point, with this in mind I would definitely go for using the existing column and re-calculating the values for existing ones.
In the initial proposal, the UUID attribute of the finding should be the combination of the following attributes;
report_type
fingerprint of the primary_identifier
fingerprint of the location
But maybe adding the project_id into the calculation would also make sense as it will scope the findings per project. In this way, we can remove the project_id attribute from the Vulnerabilities::Feedback model as well and just add association based on the UUID of the finding.
How to calculate the UUID V5
It is possible to calculate UUIDs which are always the same for the given Namespace ID and Name values. So we need a method with the following signature to generate the UUIDs; string: generate_uuid(string: namespace_id, string: name)
Rails already has a utility method to generate UUID V5 which can be used like so;
FINDING_NAMESPACES_IDS={development: "random-uuid-picked-asID",production: "another-random-uuid-picked"}defgenerate_finding_uuid(values)name=values.join('-')Digest::UUID.uuid_v5(namespace_id,name)enddefnamespace_id# This can be also put into a config file# and read by `Rails.application.config_for(:finding_namespace_id)`.# Note that; namespace_id is a string which can be in the form of a# UUID or just an arbitrary thing. We can discuss what to choose later.FINDING_NAMESPACES_IDS.fetch(Rails.env)end
Important Note: Rails UUID V5 generation method has a flaw inside which results in creating wrong UUIDs if the namespace ID is also a UUID. We can either use the gem uuidtools or patch the methods of Rails to fix the behavior.
You can read more about generating namespace based UUIDs(not random) here.
How does the migration look like?
We should probably create a service class which can be used by the migration as well as by the application logic. Then we should iterate through all the finding entries saved in the database to calculate the new UUIDs and update them accordingly. But before migrating the existing data, we should first remove the default value for the UUID column and application logic to set the UUID values before saving them. The order is like;
Change the StoreReport service to save findings with their UUID V5 values.
Remove the default value from UUID column.
Implement a migration to update the existing data.
The design issue is where we discussed changing the vulnerability feedback identification can also be found here: #205489 (closed)
Michał Zającmarked the checklist item backend Change the StoreReport service to save findings with their UUID V5 values. Calculate them using report_type + <fingerprint of the primary_identifier> + <fingerprint of the location> + project_id as completed
marked the checklist item backend Change the StoreReport service to save findings with their UUID V5 values. Calculate them using report_type + <fingerprint of the primary_identifier> + <fingerprint of the location> + project_id as completed
According to Kibana there were no warning when saving the UUID fields so something must have changed in mean time. I'll create a merge request for additional instrumentation.
EDIT: Kibana logs seem to confirm that this is not being called as there's no information about what UUIDv5 was generated.
Latest Vulnerabilities::Finding is #<Vulnerabilities::Finding id: 5876702, created_at: "2020-10-29 08:26:35"> with UUID of 29348702-f764-4e5a-81a9-140c0eff734a
@Quintasan, you probably just wanted @gitlab-org/threat-management/secure/threat-insights/backend since @gitlab-org/threat-management/backend includes both groupthreat insights and ~"group::container security".
It turned out that on production we have records that would result in a duplicate UUIDv5 which cannot happen since we have an UNIQUE index on the uuid column. I have paired up with @minac to find out what's the cause and how should we address it. We have determined that we need three changes so that we don't mess up the vulnerability creation process.
Adjust StoreReportService to look up findings using UUIDv5 first, then the old finding_key and only then should a new Finding be inserted.
Then we can remove the rows that would cause duplicate UUIDv5 when recalculating UUIDs for older findings
@Quintasan is your plan to the epic the 3 points mentioned above? Ie. 2 new issues plus this one?
If so, I wouldn't bother with the overhead. Just create 2 new issues and block this issue here against them. I noticed there was a production on Epic creation that probably stopped you from doing it. I'm cool if you still want to do it - just giving options to make life easier.