@michael.gibson.here, thank you for opening this issue. Can I please clarify this item?
"location" field provides the same results from main API
Are you considering the values from raw_metadata as part of the REST (main) API results?
In any case, this exists in the GraphQL API as location but the returned object depends on the scan type. This is because locations depend on the scanner needed (e.g. SAST can give you a line/column number wheres DAST will give you a hostname and request path+parameters - see VulnerabilityLocationDast)
@matt_wilson this is pretty reasonable and shouldn't be a big job. Do you want to do all of these while we're at it? I'm not 100% sure on some of these fields myself (due_date, start_date, the "notes" ones) so would appreciate a review.
@michael.gibson.here Thank you for detailing out the gaps. GraphQL is where we are focusing all of our API efforts for Vulnerability Management going forward so we want to ensure that it can support all the existing use cases for the REST API. If you have a moment, it would be very helpful to get your thoughts on my list of fields called out below as there are potentially some legacy fields that may not make sense to carry forward.
@thiagocsf I agree, these would be great to do at the same time. The sooner we can get parity with the remaining outstanding REST API bits, the easier it will be for all to make a full GraphQL transition. I'm also far from 100% certain what all of these fields are used for (or not) in the old REST API but here are some thoughts and questions on a few that jump out:
REST field
GraphQL Field
Comment
confidence
missing
I don't believe the analyzers output this anymore as it was inconsistent and misleading. I would not consider brining this field forward.
due_date
missing
Is this value populated anywhere in the database? I don't recall ever having a due date for findings or vulnerabilities. This could be useful for future SLA tracking but associating it with the vulnerability/finding record may not be the best approach. If it looks unused today, consider removing it.
finding
missing
It appears this contains data that exists in other fields in the GraphQL endpoint. Likely not necessary/redundant.
last_edited_at
missing
This should be obtainable by parsing the notes. Perhaps a filter or convenience GraphQL endpoint could be a handy equivalent.
last_edited_by_id
missing
This should be obtainable by parsing the notes. Perhaps a filter or convenience GraphQL endpoint could be a handy equivalent.
start_date
missing
Is this value populated anywhere in the database? I suspect this was to be paired with due_date. This could be useful for future SLA tracking but associating it with the vulnerability/finding record may not be the best approach. If it looks unused today, consider removing it.
updated_at
missing
This should be obtainable by parsing the notes. Perhaps a filter or convenience GraphQL endpoint could be a handy equivalent.
updated_by_id
missing
This should be obtainable by parsing the notes. Perhaps a filter or convenience GraphQL endpoint could be a handy equivalent.
If the confidence field is deprecated then that should be fine to not migrate. I did not see where that was indicated.
The other specific field we are concerned with is the "solution". This may or may not present itself in the "raw_metadata" field depending on if the vuln has an associated CVE.
Are there any plans to migrate the "raw_metadata" field to GraphQL?
What purpose does the "hasSolutions" field have. I assumed it was based off of the existence of raw_metadata['solutions']
The other specific field we are concerned with is the "solution".
This field is ingested to the database, so it should be easy to expose it in GraphQL; I'll add to the scope of this issue.
Are there any plans to migrate the "raw_metadata" field to GraphQL?
No. This field is going to be deprecated (apologies for the confidential Epic link). It contains a copy of the raw json from the security report artefact and, instead of storing it as such, the content will exist in the DB in separate fields and tables (e.g. remediations).
We've been working on this since %13.6 and are nearing completion. Since you're relying on values from it, I'll catch-up with @matt_wilson so we can share the deprecation timeline sooner rather than later. My intention is to provide a suitable alternative before actually removing the field.
What purpose does the "hasSolutions" field have.
I only see it used in one spot in our UI to toggle an item display. As you well observe, it might be under-utilised because you then need to go grab the action solution from somewhere else. I'd like to add it to the scope of this issue as well.
No worries, @michael.gibson.here. The GraphQL syntax can be tricky. I'm glad you were able to use it based on my example above.
If you have suggestions to improve the documentation or would like to contribute to it, please let me know. Also happy to help if you run into other trouble using our GraphQL API - feel free to tag me in any questions on https://forum.gitlab.com/.
Based on answers from comments, I've updated the possible fixes in the description and this issue is ready for refinement. There are other fields missing from GraphQL which will be added in a separate issue (to be linked here shortly).
Michał or whoever ends-up refining this, please check the related issues. At least two of them have overlapping fields and this one here might already be covered by them.
WRT confidence, I don't think we should add it to GraphQL because it will be an extra breaking change to manage in a few months' time. If you agree, please remove it?
The projectDefaultBranch shouldn't be an attribute of the VulnerabilityType, as it is a project attribute. Also, in the future, we can merge the findings with the vulnerabilities so do we really want to expose our internal way of storing the data by exposing a field called finding?
@minac, I have removed both from the scope in the interest of keeping this issue ready for work. However, I think we still need to address the lack of finding in GraphQL.
How is someone supposed to query findings (ie vulnerabilities in non-default branches) using GraphQL?
However, I think we still need to address the lack of finding in GraphQL.
Correct. I didn't want to address it within the scope of this issue as we may want to change what we want to expose related to findings soon. Based on the policy we have, this is a long process for GraphQL as it takes almost a year to deprecate and remove a field so just wanted to make sure. I think we should create a separate issue to focus only on what to expose related to findings on GraphQL. WDYT?
How is someone supposed to query findings (ie vulnerabilities in non-default branches) using GraphQL?
We already have an attribute for the pipeline type called securityReportFindings to query the ones you mentioned.
Why interested: Customer asked the account team to elevate it's visibility
Current solution for this problem: N/A
Impact to the customer of not having this: Certain fields are missing or provide inaccurate results. This is preventing them from migrating to the GraphQL API for their vulnerability management integration.
@thiagocsf can groupcomposition analysis borrow this issue? @farias-gl is willing to exercise his skills in GraphQL and this issue looks for me as a good and meaningful item to work on.
@farias-gl, if you need any help, please feel free to tag @gitlab-org/govern/threat-insights-backend-team. This was refined a while ago, so please take the implementation plan with a grain of salt.
@minac I noticed we had a detectedAt and now a createdAt field. Is there a difference in meaning? I imagine the vulnerability is created at roughly the same time its detected?
Unless we record the time the vulnerability is detected by a scanner, as a different time of the database record creation time. This to me seems more plausible since scanners will run and generate json artifacts which are then parsed later.
Also wasn't sure of the difference between last_updated_at and last_edited_at. These values exist in the database field for Vulnerability, just need some guidance for the field description in graphQL
@farias-gl - detected_at was introduced by Add detected_at to Vulnerabilities table (!66164 - merged). It's mostly the same as created_at but can differ for manually created vulnerabilities. The VulnerabilityType is serializing this attribute but using the created_at though
Also wasn't sure of the difference between last_updated_at and last_edited_at
I don't know the difference either. @gitlab-org/govern/threat-insights-backend-team does anyone know the answer?
Also wasn't sure of the difference between last_updated_at and last_edited_at
I have seen some models like Note that has last_edited_at and updated_at in order to distinguish updates performed by the user from any kind of updates. Those are usually paired with something like last_edited_by. But Vulnerability also has updated_by_id so I am not sure if the same reasoning would still apply.
I can remove start_date and due_date from my MR so we don't introduce them via graphQL. Removing them from the ActiveRecord model and the table in the DB I think would be outside the scope of the MR for this issue.
@minac since you are more familiar with this issue do you think you can update the fields in the implantation plan above #299904 (closed). It would just be removing the ones we don't need.
I put up the MR based on the issue description because the discussions were a lot of back and forth in the thread and hard to follow.
This issue thread was very hard to follow. I saw some discussions of fields added, that are not being used but "may" be used in the future. I think we need to re-refine this a bit. At least to re-confirm the fields
Removing them from the ActiveRecord model and the table in the DB I think would be outside the scope of the MR for this issue.
@farias-gl - I agree that removing them shouldn't be part of this issue.
Seems like the following fields are always null and introduced as feature envy back then;
start_date
due_date
lastEditedAt
lastEditedBy
updatedBy
@matt_wilson, @thiagocsf - Adding these fields to GraphQL API would introduce a technical debt as we will probably try to remove them later. I think in the scope of this issue, we can only add updated_at and fix the detected_at logic. WDYT?
@minac, I agree. There's no point adding fields that we know 1. are always null and 2. will be removed.
@farias-gl, please feel free to change the scope to remove the fields - and could you please create a follow-up issue to remove the unnecessary fields?
detected_at was introduced by Add detected_at to Vulnerabilities table (!66164 - merged). It's mostly the same as created_at but can differ for manually created vulnerabilities. The VulnerabilityType is serializing this attribute but using the created_at
Could you please elaborate by what you mean by "fix"? detectedAt is already exposed in the graphQL schema.
Could you please elaborate by what you mean by "fix"? detectedAt is already exposed in the graphQL schema.
Do you mean detectedAt should read the detected_at field and not created_at?
Yes, I meant this but double checked the manually creating vulnerability UI and there was no input field to set the detected_at so you can ignore this.
@Quintasan - Why did we introduce this new field if it's not in use?
@minac I'm pretty much sure it was introduced in #10272 (closed) according to the implementation plan because creation time != detection time for manual vulnerabilities most likley
@matt_wilson - Currently there is no input field on the frontend to set the detected_at attribute while manually creating the vulnerabilities, do we have a plan to introduce a frontend change to let the users set it soon?
Currently there is no input field on the frontend to set the detected_at attribute while manually creating the vulnerabilities, do we have a plan to introduce a frontend change to let the users set it soon?
No plan to add this. The design for manual vulnerability creation did not include letting the user input a detection time/date. Assuming the detected_at value for a manual vulnerability is its time of creation, that's perfectly OK—and the behavior we'll keep unless this comes up with customers as needing to change.
because creation time != detection time for manual vulnerabilities most likely
I think I added this to the implementation plan after a discussion around this idea external to the design issue. I tried looking back through group discussions to see if there was anything in those, but I don't have access to the 2021 archive.
Agreed, I see can't see a reason to keep this. It's just as likely that I suggested having separate creation and detection dates for manual vulnerabilities or perhaps it was included in an early design concept. In any event, if the field is unused, we can remove it to avoid future confusion.