nitpick: Have you considered extracting version string into a const to create SSoT from the start?
@lma-git: I originally just hardcoded this number in because it will likely be temporary. If we proceed with deprecating Repository X-Ray, we would remove scannerVersion from the payload entirely.
But extracting it to a const will clean it up and allow me to explain further in a comment. So I'll go ahead and do that.
### Proposal update
Due to the database-level constraints caused by the column file_checksum not being nullable, we will encounter issues when we remove checksum from the XRay Report schema as per the proposal above, including failing tests. Therefore, we're adding an additional step to add a migration that allows file_checksum to be nullable before removing the field from the Xray Report schema.
Follow up
As previously mentioned above, removing the xray_reports.file_checksum column requires a multi-milestone approach. This issue can be closed as complete in %17.6. Follow-up issues for the remaining removal steps are:
Leaminn Machanged title from Follow-up from "Reproduce Repository X-Ray functionality - Introduce lock file classes" to {+Update XrayReport model JSON schema+}
changed title from Follow-up from "Reproduce Repository X-Ray functionality - Introduce lock file classes" to {+Update XrayReport model JSON schema+}
Leaminn Machanged the descriptionCompare with previous version
Do you know why we anticipated needing the checksum value? It doesn't seem to be used anywhere for verification from what I can see so far.
With the new in-monolith parsing service implemented in #476177 (closed), I'm thinking we can just remove the checksum value all together and also remove the dedicated column xray_reports.file_checksum. The main reason is because I plan to merge the dependencies extracted from multiple files into a single xray report (#491800 (closed)), so it doesn't quite make sense to have a checksum for a single or multiple files, especially if they're not necessary.
@lma-git afair it was envisioned as a way to verify if new scan is required, I am not sure if it ever got implemented though.
Edit: At the begging scan deduplication was delegated to CI job trigger rules. As for removal, do not have strong opinion around deduplication strategies, apart from the fact that it is required. So if you have other solution in mind, imo feel free to remove artefacts from the deprecated solution and add more suitable ones instead.
With the new in-monolith service, the cost of parsing the file is relatively low compared to the Gitaly calls we make to pull the files in the first place. So I don't think it would be that beneficial to implement the checksum logic at this time. However, we could re-consider it in the future if it becomes necessary.
I think we can proceed with removing the checksum fields until we have a more concrete strategy moving forward (particularly one that supports multiple files in a single X-ray report).
The new service has a Sidekiq deduplication strategy and exclusive lease so that only one X-ray Sidekiq job runs at a time per project. This already significantly reduces how often we're running the service, so I don't see re-parsing the same files as a concern at this time.
I think we can proceed with removing the checksum fields until we have a more concrete strategy moving forward (particularly one that supports multiple files in a single X-ray report).
I'm currently picking up this issue and just wanted to check in and clarify that, as per above, we still want to go ahead and remove the checksum field from the schema in Repository-X-Ray and in the monolith?
we are changing fileName to filePath for readability purposes
It's mostly for consistency IMO. I use the term file_paths in Ai::RepositoryXray::ScanDependenciesService, so the payload should follow suit. "Path" is more descriptive of its value. And in general, snake case is preferred over camel case in field names.
@squadri Luckily the JSON schema for payload isn't strict nor enforced by the database, so I included small changes like fileName -> file_path because they're easy to implement. Unlike the column removal, the JSON schema changes and references to its fields can all be done in one MR.
I have merged two MRs to update the XrayReport JSON schema and add an ignore_column rule and a migration to make the file_checksum column nullable as part of the multi-step approach to remove the xray_reports.file_checksum database column.
Next Steps:
As per the multi-step release guidance, we are required to drop the column and remove the ignore_column rule in two separate milestones. I have created two separate follow-up issues to address that:
[#500465 (closed)] Remove ignore_column rule for file_checksum in xray_report model - %17.8
Blockers: N/A
Shipping this milestone?
No, due to the requirement of a multi-step release approach across three separate milestones for DB migration related work. Therefore, I'll keep this issue open until all subsequent issues are resolved.