Proposal: Model and represent "Assets" related to security findings or scans generically

Background

An organization's computing footprint, as part of delivering services to their customers, if often made up of many opaque components. In the context of engineering, and especially within security engineering, these components are often referred to as "Assets". Assets is a fairly overloaded term, as different disciplines within the technology industry use it to refer to different types of opaque component produced or managed by an organization as part of delivering services to customers. In the context of this proposal, I will use the term assets as it is often used in security engineering - to refer to opaque components which may or may not have security issues. This includes assets such as cloud or physical server systems, web applications, running containers, DNS records and load balancer endpoints, as a non-exhaustive list.

Problem to solve

GitLab provides many of the features and tools required to implement a cohesive end-to-end security practice within an organization. Currently this is mostly limited to scanning for and acting upon security issues detected in container images and codebases built or hosted on GitLab, with no way to correlate or represent these findings in the context of assets outside of GitLab. This lack of context and representation leaves users to infer based on institutional or organizational knowledge the impacts of a security finding and leads also to manual research and correlation between GitLab and external systems to try and build context around findings. Additionally, the lack of a representation of assets within GitLab means any effort to use external data sources such as cloud computing inventory data to build a baseline for security scanning is an exercise left to the user. There is also an opportunity for asset inventory functionality to be useful as a standalone feature to help organizations keep track of the components which make up their computing footprint, which is normally a standalone function handled with configuration management database (CMDB) or dedicated asset inventory products.

Proposal

I've marked this proposal as both Category:DAST and Category:Vulnerability Management as I think this proposal has applicability to both areas at least initially.

Main proposal: Model and represent assets in GitLab

To address this problem, the core proposal here is to introduce a new concept of assets to the Secure section of the GitLab UI, which would have associated database models for storing assets. At first, I would expect this section to be quite basic, and for support to grow into other parts of GitLab over time. The key information which needs to be modeled and represented about an asset would be:

Asset type (examples: Server, DNS Record, Load Balancer, Container)
Asset name or identifier (examples: Server hostname, DNS record data, Load Balancer public DNS record, Container name)
Environment (these would be user controllable/editable)
Data classification (these would be optional & user controllable/editable)
Asset operating system or image (if applicable)
Network addresses (IPv4 and IPv6 addresses if appropriate)
Owner (refers to GitLab user)
Description (this should be editable)
Tags (for filtering and narrowing assets based on user-defined tags)

For environment & data classification -

It's important the users have a way to work out which assets are more important in relation to others. One common way this way this determination is made is through environments - i.e. production vs development vs test environments. Environments should be something that can be defined at the project or group level, and then will be applied at the asset level, but can be individually overridden. I could also see this being extended in the future to support rules to make GitLab away of naming conventions or IP subnets which could be used to automatically determine an asset's environment. The environment can then be used to prioritise security findings or incidents associated with that asset, as production security findings would normally be treated as a higher priority compared to say, security findings in test systems.

This could then be extended by data classifications. Similar to how we do this at GitLab, it is common for organisations to have some level of data classification - even if those classifications are more simple classifications such as "Processes personal information" or "processes payments". A security finding on a system that processes personal information is naturally more urgent to address than a security finding for an asset which does not handle personal or payment information, as an example.

Asset workflow / UX

Assets should be able to be created at the project or group level, and should "roll up", similar to how Issues are visible at the group level, to their parent group so that users can get a view of assets across a group or narrow their view to a specific project. This is useful in the "asset inventory" use case detailed in the "Problem to solve" section as well, as this is a fairly intuitive way to represent group or team ownership of specific assets.

Assets should respect the visibility and access control on the project or group they belong to - which enables individual teams or groups within an organization to control access to information about specific assets.

Assets should also be able to be linked to security findings (vulnerabilities), and when asset information has been used as an input into vulnerability or DAST scanning, the asset and finding should be automatically linked by GitLab and the relationship persisted in the database.

This relationship between findings/vulnerabilities can then be used to provide asset views which show the number of vulnerabilities on each asset, and also to enrich the vulnerability report with asset information and the ability to group findings by asset. This is helpful for compliance & also infrastructure vulnerability remediation work.

Assets API

It would be helpful, given that mostly assets would be populated from existing asset inventory tools or data sources, that GraphQL queries were available for listing and querying individual assets. It would be helpful to be able to query by project, owner, and asset type.

Mutation for creating, deleting and modifying assets would also be helpful for building automated tooling to important and synchronize assets from external sources. A mutation to modify the tags on an asset would also be helpful.

Stretch: Import / Synchronize Assets with external data sources

One key area where having assets modeled in GitLab has value is being able to import existing data users have about their infrastructure and asset inventories in general into GitLab. The allows for data sources such as cloud providers (e.g. GCP, AWS), platforms (such as Kubernetes) and asset discovery tools (such as Attack Surface Management tooling, domain enumeration tooling, and other security data sources) to automatically push information about an organization's assets into GitLab, for correlation with security findings detected on those assets. These correlations can happen in a number of ways, a few examples:

Where assets represented in GitLab have been used as input to a scan, the asset is already known
Where server, IP, DNS or load balancer assets share public IP or DNS records with a system detailed in a security finding/vulnerability
Where container scan findings for a specific image version match the running image of a container asset

These data sources would be configured globally, or scoped to individual groups or projects in case specific assets should not be visible to other groups.

Having the ability to have supported external sources for asset data would make GitLab a natural SSOT for asset information across an organization as well as for security findings across those assets.

(cc @abellucci, @erran)

Edited Jul 09, 2023 by James Hebden