We have expanded the methods for getting vulnerability data into GitLab beyond scanners running in pipeline jobs. For instance, you can manually create records via API and soon will be able to continuously scan container registries. This means that the same type of scan (e.x. Container scanning) can have results from multiple different origins/sources. Without a way to distinguish the source, it will be less clear to the user where a given result originated. This, in turn, can make it harder to understand where a vulnerability exits and how best to remediate it.
Add a new "origin" or "source" attribute for vulnerability records. This new attribute in combination with the existing scan type data can be used to more precisely determine exactly where a vulnerability exists/was detected. This new attribute needs to at minimum be supported by:
All security report schemas (possibly by adding to base schema)
The GitLab Rails monolith for schema parsing
All relevant GraphQL queries/mutations/connections including the manual vulnerability creation mutation
The GitLab UI:
Vulnerability details pages
Manual vulnerability creation form (by setting origin either automatically to "generic" or letting user chose)
The CSV export report on the Vulnerability Reports
The new attribute should likely be an ENUM and not free text. This will help keep the "source" values consistent for future use cases like filtering in Vulnerability Reports or fetching vulnerability data via GraphQL. To start, the proposed sources are:
Pipeline
API
Manual (this is the Submit vulnerability form)
Registry (previously called "Continuous")
Operational (running environment or production environment)
To make this addition backward-compatible, the new "source" field must be optional from the input side (security report schemas, and APIs). A default value of pipeline must be used at the data layer when it unspecified in the input.
Thanks @matt_wilson! I just added "operational" to the list since technically our cluster_image_scanning should be container_scanning with an "operational" environment source. In the past we have just been using different scanner names as a workaround.
@sam.white Now that I see Operational and Continuous together, I'm not sure that Continuous is the right word. Would it be better to change this to something like Registry? And maybe Operational should be Environment? puts on grammar hat The source feels like like it wants to be a noun instead of an adjective or verb. Following that logic, Manual should also be something different but I don't have any ideas about what just yet.
I've also struggled with "continuous" but don't think I have a better suggestion.
The registry scan happens in response to an event: someone pushes an image to the GitLab registry. The same could apply to scans performed due to a response to any other action, such as pushing to a project's git repository.
Words that I considered (I'll let @matt_wilson work-out if they're nouns or adjectives or adverbs):
The registry scan happens in response to an event: someone pushes an image to the GitLab registry. The same could apply to scans performed due to a response to any other action, such as pushing to a project's git repository.
I guess the question is do we want the "source" to describe how the vulnerability detection was performed or what the detection target was (or where it was performed). That's what I was thinking of with noun versus verb/adjective. IMHO, Continuous and your alternate suggestions are all "how" (and adjectives). But is that the important part to the user? I honestly don't know. But I would think source: Registry and type: Container scan is more informative than source: Continuous and type: Container scan.
Of course I'm just throwing around opinions. I'd like to phone a friend and use a lifeline. @andyvolpe@claytoncornell would appreciate your thoughts on this discussion.
/cc @beckalippert FYI, only if you want to have fun with grammar!
@matt_wilson@thiagocsf Chiming in here... Continuous doesn't tell me much about the detection... whereas Registry tells me a lot more about the detection. At least it feels that way from my viewpoint.
Is there a state where the source isn't Continuous? An intermittant/periodic state? What is it if it's not scanning continuously?
@claytoncornell It sounds like you're also in favor of @sam.white's suggestion to change to Registry. I like this this for the reasons you both gave.
The other two that I think might have better alternatives are:
Manual
Operational
The Manual ones are the result of a user going to a Vulnerability Report, clicking the Submit vulnerability button, and then filling in a form. Form seems a bit too generic. It doesn't have to be one word, either; I just don't have any good ideas here.
Operational might be fine. Sam's clarification about these being in running environments makes me think that Environment might be even better. Running environment would clarify that even further. WDYT?
Operational might be fine. Sam's clarification about these being in running environments makes me think that Environment might be even better. Running environment would clarify that even further. WDYT?
@matt_wilson my thinking with this was to align it with the names we chose for the tabs on the Vulnerability Report page. I am thinking we should keep these two consistent and if we change one then we should change the other. Thoughts?
@matt_wilson Aligning with the tabs (as suggested) gives a "path" for users to follow - aka a great idea from a doc standpoint (even if that means the tabs need some tweaking for consistency). One of the harder things I face when documenting stuff is when the terminology drifts... where we have to say $NAME in the docs and then we have to define where it's $DIFFERENT_NAME for the same thing in a few odd places.
So, from the docs side, I look for descriptive (even as a single word) labels that give users a way to connect the dots and figure out what the applicability is of the information being presented. In this case, knowing where the vulnerability was found is more informative (at least to how I'm looking at it) than how it was found.
Oh and answering my own question from earlier: results from a container scan look the same whether they were produced by the continuous scan or pipeline scan. (copied from the Epic discussion)
@sam.white@claytoncornell I don't think we have to keep the source names the same as the tabs. Today, the tabs contain vulnerabilities from more than a single source. The Development vulnerabilities tab contains those from pipelines, scheduled scans, the API, and manually entered. The tabs today are a more generalized "where" the vulnerability occurred. Scan type is "how" the vulnerability was detected. Source will be the means via which a detected vulnerability was put into our system. This means you can eventually have a source of Manual while also specifying the scan type.
One of the harder things I face when documenting stuff is when the terminology drifts... where we have to say $NAME in the docs and then we have to define where it's $DIFFERENT_NAME for the same thing in a few odd places.
This is a fair point. We are currently planning to allow customizing and creating your own tabs (or "views"). This will create more chance for divergence. I'd rather not force—and document—a pattern now that will likely change in the not-too-distant future. Ultimately, I view source as another piece of metadata that will help users further triage, sort, and filter vulnerabilities more granularly. I don't think it is a "primary" piece of data but more secondary so tying primary UI elements to source values may not be the best approach.
Technically, we may have to consider these 2 things:
Propagating the source from gitlab rails to analyzer:
To propagate the value of source from the pipeline to analyzer, we might need to introduce a CI variable (eg: SECURE_SCAN_SOURCE - maybe a better name too). This variable should be read by all the analyzers and it has to be set into the report json so that the source could be set while the report is parsed in rails
Creating source column in DB
We need to create a new column in vulnerabilities table and also in vulnerability_reads table with default value and prepare a migration to set source for cases like manual. For vulnerability_reads, currently there is a trigger (trigger_insert_or_update_vulnerability_reads_from_occurrences) which creates vulnerability_reads record everytime a vulnerability is created. We need to update the trigger to consider source.
Nit: SECURE_SCAN_SOURCE excludes devopsprotect. SEC_SCAN_SOURCE should work.
This variable should be read by all the analyzers and it has to be set into the report json so that the source could be set while the report is parsed in rails
We're in this situation because there's no way to run a task other than sidekiq (simple, limited) or CI jobs (complex, very flexible). I don't know how I feel about passing this responsibility to the analyzers - IMO, anything coming from a CI job should have a default of Pipeline.
Certain analyzers - that is, the pipeline-based ones in support of a non-pipeline trigger mechanism - should be able to override this.
@sashi_kumar I feel strongly that the analyzers should not need to read back from the pipeline. This will add a non-trivial new requirement that will impact not only our own analyzer teams but all of our many security integration partners. I would much prefer that we go with a "safe" default assumption of pipeline as @thiagocsf suggests. I updated the description to call this out and also clarify that the new "source" field will be optional in the API, security reports, etc.
@matt_wilson, in case I made it murkier, note that the proposal is no different than the other CI variables that analyzers already rely upon to drive their behavior, such as SECURE_LOG_LEVEL.
My nitpick was with the recommendation that all analyzers should support this ("should be read by all the analyzers"). We should communicate the availability to our Section, but whether or not they need/want to support is up to them.
I agree with your proposal update, but I'm going to change "should be optional" to "must be optional".
@thiagocsf Thanks for clarifying. I misunderstood "where" (or maybe "when") this would be happening. I interpreted "rails" to mean that something about the runtime state of the pipeline would communicate back to analyzers something to the effect of "Hey, it looks like you are running in a pipeline. Set your SECURE_SCAN_SOURCE to Pipeline in your security report artifact". If this is an optional CI variable, then I don't have any concerns.
Technically, we may have to consider these 2 things:
Propagating the source from gitlab rails to analyzer:
To propagate the value of source from the pipeline to analyzer, we might need to introduce a CI variable (eg: SECURE_SCAN_SOURCE - maybe a better name too). This variable should be read by all the analyzers and it has to be set into the report json so that the source could be set while the report is parsed in rails
Creating source column in DB
We need to create a new column in vulnerabilities table and also in vulnerability_reads table with default value and prepare a migration to set source for cases like manual. For vulnerability_reads, currently there is a trigger (trigger_insert_or_update_vulnerability_reads_from_occurrences) which creates vulnerability_reads record everytime a vulnerability is created. We need to update the trigger to consider source.
I don't have a better suggestion, so that's the best plan we have and I'm not saying we shouldn't do it.
To explain my question, it comes from a performance concern.
The vulnerability_reads model was created to alleviate performance issues on the vulnerabilities model. We need to be very diligent about what we add to this new table, or we'll end-up at the same place we were before: a bloated table that causes timeouts.
@matt_wilson, @sam.white, now that @sashi_kumar is starting to write issues for &2340 (closed), we need to bring this issue back to our radar because it's a blocker for the ~"group::container security" epic.
Given UI is in-scope, I'm setting workflowdesign but it needs to be scheduled and assigned to a team.
Aside from frontend, there are also at least 3 separate backend issues that will need to be created to deliver the proposal as-is.
This new attribute needs to at minimum be supported by:
All security report schemas (possibly by adding to base schema)
The GitLab Rails monolith for schema parsing
All relevant GraphQL queries/mutations/connections including the manual vulnerability creation mutation
The GitLab UI:
Vulnerability details pages
Manual vulnerability creation form (by setting origin either automatically to "generic" or letting user chose)
The CSV export report on the Vulnerability Reports
The first three are indeed backend work, and CS can contribute to TI. The latter is frontend, which won't be done in &2340 (closed) since, in that epic, we're only dealing with the vulnerability report.
If @matt_wilson is happy for us to take that FE work into CS, then I have no other objections.
Either way, I suggest we create separate issues for each of the implementations above as they're reasonably distinct pieces of work.
If there are no objections, I'll create a new Epic for this issue, convert this issue into a Design issue, add implementation issues for the BE items above, and then we can decide who's refining and implementing them.
@thiagocsf@sam.white I'm OK with CS taking on the frontend pieces as well; please keep me and @beckalippert in the loop before finalizing designs/starting implementation.
That said, the UI pieces aren't crucial to roll out with the backend implementation of the "source" field. Aside from giving Sam the ability to have a new tab on the vulnerability report, these others are nice-to-have—and I'm not sure there's even UI required for 2 of them:
Vulnerability details pages likely only needs a small UI change to add the source field.
The manual vulnerability creation form can be hard-coded to always set source to Manual, no UI change needed. Then, when we update the form in the future to allow section of other scanners, this piece is already in place.
The CSV export report doesn't require UI work, only adding a new "source" column to the export.
please keep me and @beckalippert in the loop before finalizing designs
@matt_wilson I agree with what you wrote above. I am hoping to move this straight into implementation without specific design work unless that is needed for some reason. We already have the existing designs for the tabbed approach on the Vulnerability Report. I expect the first iteration of this would not change that user experience at all. There might be some FE work to have the Operational Vulnerabilities tab filter on this new source field instead of filtering based on analyzer name; however, the end user experience would stay the same.
We can follow up with the other three changes you mentioned as separate follow-on iterations if/when needed.
@sashi_kumar, given the thread above, WDYT of writing implementation issues in the Epic you're planning?
We can add this issue as related to them. When Threat Insights comes back to this issue, they can then pick-up where we left (which actually should be most of the proposal above - they might be able to just close it).