Store Security Products scan results into the database
Goal
We need to store information about Security Products results into the database, so they can easily managed and consumed by reports and dashboards without impacting performances.
When a job creates a report artifact, it should be identified properly as such, parsed and stored.
Every component (Security Dashboard, MR widget, etc) that needs to access this information will leverage the database instead of the original file.
- Show closed items
- View on a roadmap
- Issuegitlab-org/gitlab#704611.4Deliverable Enterprise Edition GitLab Ultimate backend database devops application security testing direction missed-deliverable missed:11.3 type feature
- Issuegitlab-org/gitlab#751011.5Deliverable Enterprise Edition backend devops application security testing missed-deliverable missed:11.4 type feature workflow in review
- Issuegitlab-org/gitlab#671711.5Category:SAST Deliverable Enterprise Edition GitLab Ultimate [deprecated] Accepting merge requests backend bug performance database devops application security testing missed-deliverable missed:11.4 type feature
- Issuegitlab-org/gitlab#671811.6Category:Dependency Scanning [DEPRECATED] Category:Software Composition Analysis Deliverable Enterprise Edition GitLab Ultimate OKR-FY24Q2 SCA:Dependency Scanning backend bug performance database devops application security testing missed-deliverable missed:11.4
- Issuegitlab-org/gitlab#706111.7Category:Container Scanning Deliverable Enterprise Edition GitLab Ultimate backend database devops application security testing type feature
- Issuegitlab-org/gitlab#706211.8Category:DAST Deliverable Enterprise Edition GitLab Ultimate backend devops application security testing type feature
- Show labels
- Show closed items
Activity
@bikebilly Could you elaborate on how https://gitlab.com/gitlab-org/gitlab-ce/issues/46809 relates to this epic?
@fcatteau with https://gitlab.com/gitlab-org/gitlab-ce/issues/46809 we'll have a way to "define" specific artifacts that will be passed by the runner to GitLab in a different way than standard artifacts. So you'll not receive an archive to unpack and parse, but you'll be able to access directly the content of the report.
The way they will be used is that the content will be parsed post-receive and can be then elaborated and stored in the database.
I suppose this is what we want to achieve at the end, so we are not relying on the artifact files anymore.
Great! But in that case https://gitlab.com/gitlab-org/gitlab-ce/issues/46809 is a dependency of this epic, and that includes the second phase that is targeting %11.2.
@bikebilly What's the right place to discuss how the artifacts get parsed? I just posted a comment regarding this on https://gitlab.com/gitlab-org/gitlab-ee/issues/6717 but this is generic really.
@fcatteau I suggest to have a discussion in the issue we will implement first, and then we can reuse the approach in all the subsequent ones.
But if you prefer to open a generic issue to discuss about the overall approach, I'm also fine with that.
- Edited by Philippe Lafoucrière
@bikebilly We already have an epic for that subject, which is linked in the OKRs: &241 (closed)
Can we merge these two, please?
@bikebilly @plafoucriere also, as explained in &241 (closed) I think we need to define how we want to identify the vulnerabilities and how we want to aggregate them before starting to implement the DB schema. This first step has been already started for SAST: https://gitlab.com/gitlab-org/gitlab-ee/issues/6590 but not for Dependency Scanning.
Even if it should be easier for Dependency Scanning (as we benefit from our experience here and the complexity seems to be lower) I think we should stick to two separate issues and explicit the sequential relationship.
Defining how we want to identify and aggregate vulnerabilities involves product, UX and technical concerns, while storing the data in DB is entirely technical but requires the previous step. Committing on delivering the DB storage issues (gitlab-org/gitlab-ee#6717 & gitlab-org/gitlab-ee#6718) without having solved these prerequisites seems optimistic to me.
We could eventually merge both the identification/aggregation specification and the implementation in the same issue but it should be crystal clear in the description.
@gonzoyumo I agree on the prerequisite here, but I'm confident we can do both at the same time. I also think it could be simpler for Dependency Scanning, so we can start from there.
Even if data consumption is still to be fully defined, data in the reports is already well-known. Do you think this should be elaborated first and then put into the database? I'd rather consider to store data and elaborate it during the "extraction" phase, but I'd like to know what you think about that.
If we all agree to start with Dependency Scanning, we can just consider https://gitlab.com/gitlab-org/gitlab-ee/issues/6718 as our Deliverable for %11.2.
Putting Data into the DB requires more conception than in a json file. We'll have to figure out how to create indexes, how to define unicity constraints to avoid duplicates and how to structure the tables to do efficient queries to fetch the data the way we want them. E.g. it may be interesting to extract identifiers into a separate table to ease aggregation and search.
Every choice we make on the schema must be justified and changing things here will be not as easy as for the json so we'd better do it (mostly) right from the beginning.
To make it clear, it's not just a matter of putting the current content of the json into a table. The current JSON format is made for display purpose, not for efficient structured data sharing. We could do it, but I doubt it would be accepted by DB maintainers :)
I think we need to define how we want to identify the vulnerabilities and how we want to aggregate them before starting to implement the DB schema. This first step has been already started for SAST: https://gitlab.com/gitlab-org/gitlab-ee/issues/6590 but not for Dependency Scanning.
@bikebilly I totally agree on what @gonzoyumo said. To sum it up, the JSON schema used by SAST is way more mature than the one used by Dependency Scanning. This is why I recommend starting with SAST.