CI Catalog - Restructure components to collect and display metadata automatically - Competitive
## Problems to solve - Today we rely on the author of the component to document the information in the README.md. This information is displayed in the Catalog, we would like to surface this information automatically - If I mark a project as catalog resource but it doesn't have any components I can still release a new version and this should not happen. - Only released components should be displayed on the catalog. - Today migrating from Template to Components requires lots of customization, since templates and components are almost identical we should make sure they remain similar ## Details Today we rely solely on the `README.md` to describe the list of components included in a catalog resource (components repository). We should use a more structured way to present this data to users and for us to run some validations. When a release is created, we should prevent the version from appearing in the CI catalog if it doesn't contain any components. ## Proposal - part A Today we track which projects are marked as catalog resources by using the following table: ```ruby # existing today. Populated when a project is marked as catalog resource. catalog_resources( id project_id FK to projects.id created_at ) ``` Rather than displaying versions of the catalog resource by using `project.releases` directly we should use a dedicated table so we can control which version can be displayed on the CI catalog and which one does not satisfy the requirements. We introduce a new table `catalog_resource_versions`. ```ruby catalog_resource_versions( catalog_resource_id FK to catalog_resources.id release_id FK to releases.id project_id FK to projects.id tag # could be deduplicated from releases.tag created_at ) ``` When a new project release is created we don't show it directly on the CI catalog. We instead schedule a background job to collect metadata about all the components for the given Git tag related to the release. If no components are found we don't add a record in `catalog_resource_versions`. Otherwise we add a record and use this table to show the list of versions or the latest version of a catalog resource. All the metadata collected per-component for the given version should be persisted in a new database table `catalog_resource_components`: ```rub catalog_resource_components( name # component name inputs # specs of the inputs. JSON serialized # more metadata could be added here catalog_version_id FK to catalog_resource_versions.id resource_type integer (enum { template: 1 }) catalog_resource_id FK to catalog_resources.id project_id FK to projects.id created_at ) ``` Using this data we can show a list of vetted releases, now becoming catalog resource versions. For a specific version we can show what components exist in it and display the metadata in a more structured and consistent way rather than relying solely on the README.md file. 1. In the Catalog Resource Details Page we can show each component in separate sections. 2. For each component we could document the inputs in a human readable format (e.g. with an HTML table). 3. For each component we could generate a snippet of how to include it ## Proposal - part B Collecting all the components metadata in a given repository could be resource intensive if we would need to scan the whole repository. We [have discussed](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/115988) the idea of users providing the list of components as inputs for the release. However, we should not ask users for more data that we can collect ourselves. Also this puts the burden on the user side. The proposal here is to use a directory structure that follows simple conventions, [like CircleCI Orbs](https://github.com/CircleCI-Public/node-orb/tree/master/src). If you have a component of template type, place it into `/templates` directory in the repository. - By having a very specific directory to scan, we reduce the complexity. - This provides a contract between the user describing the components and us collecting metadata. The list of components is inferred by the directory structure. Users don't need to maintain a list separately. - We can support `/steps`, `/jobs` as we add more types of components. - We allow a repository to define any other supporting files and code anywhere in the repository. We simply ignore those files. - On the same note, if you want to have local components that are only used internally by official components, you can place these anywhere in the repository. We won't collect metadata for these. - 1 directory structure. No need for top-level component and/or nested components. Example of a potential catalog resource `gitlab-org/security-components` hosting various security scanners: ``` . ├── README.md ├── .gitlab-ci.yml ├── templates/ │ ├── all-scans.yml # single file template │ ├── secret-detection.yml # single file template │ └── dast/ # more complex template. May rely on other files. │ ├── template.yml # entry point for directory-based templates │ └── ...other files ├── steps/ │ ├── api-scan.yml # single file step │ └── export-results/ # complex step (could have adjacent files) │ ├── step.yml # entry point for complex component │ └── ...other files └── other-stuff/ # Anything else is not scanned. ├── base-job.yml └── private-component-1/ └── template.yml ``` With a repository above we could use: ```yaml include: - component: gitlab.com/gitlab-org/security-components/secret-detection@1.0 - component: gitlab.com/gitlab-org/security-components/dast@1.0 scan-api: script: - step: gitlab.com/gitlab-org/security-components/api-scan@1.0 ``` ## Proposal - part C When we successfully create a `catalog_resource_versions` record we should upsert a flag `catalog_resources.visibility` to make it visible on the catalog. When a project is [marked as catalog resource](https://docs.gitlab.com/ee/ci/components/#mark-the-project-as-a-catalog-resource): - if the project contains releases, we set the `visibility: :visible` - if the project doesn't contain releases, we set the `visibility: :hidden` We should use this attribute to filter only visible catalog resources for the CI/CD catalog listing. ## Opportunities - The list of components and their metadata, collected when a release is created, can be used for search purpose and to [show relevant results](https://gitlab.com/gitlab-org/gitlab/-/issues/408191). - Hooking into this metadata collection process we could also compile a manifest file containing useful metadata about the given version. For example: list of dependencies, SBOM file, checksums, release metadata, list of components, etc. This file could be uploaded to Object Storage and tracked in `catalog_resource_versions.manifest_file` as upload. This file could be consumed by clients, including gitlab~3412464. - We could eventually detect a catalog resource project automatically based on the directory structure (e.g. check if `/templates` directory exists and contains at least 1 component). No need for the user to mark a project explicitly as catalog resource anymore. ## Challenges - How do we surface issues to the user that we didn't find any components after scanning the repository? This should be a rare scenario. ## Progress Current status: in progress. <!-- GENERATED BY HEADWAY PROGRESS REPORT DO NOT EDIT --> <details><summary>Issues (total: 16)</summary> <details><summary>16.2 (1)</summary> | :checkered_flag: | Iteration | References | Status / ETA | DRI | Impact | | --- | --- | --- | --- | --- | --- | | :x: | gitlab-org/gitlab#415289+ | gitlab-org/gitlab#415289 | %16.2 | | | </details> <details><summary>16.3 (2)</summary> | :checkered_flag: | Iteration | References | Status / ETA | DRI | Impact | | --- | --- | --- | --- | --- | --- | | :x: | gitlab-org/gitlab#415287+ | gitlab-org/gitlab#415287 | %16.3 | | | | :x: | gitlab-org/gitlab#415286+ | gitlab-org/gitlab#415286 | %16.3 | | | </details> <details><summary>16.4 (2)</summary> | :checkered_flag: | Iteration | References | Status / ETA | DRI | Impact | | --- | --- | --- | --- | --- | --- | | :x: | gitlab-org/gitlab#418996+ | gitlab-org/gitlab#418996 | %16.4 | | | | :x: | gitlab-org/gitlab#415853+ | gitlab-org/gitlab#415853 | %16.4 | | | </details> <details><summary>16.5 (2)</summary> | :checkered_flag: | Iteration | References | Status / ETA | DRI | Impact | | --- | --- | --- | --- | --- | --- | | :x: | gitlab-org/gitlab#424966+ | gitlab-org/gitlab#424966 | %16.5 | | | | :x: | gitlab-org/gitlab#424962+ | gitlab-org/gitlab#424962 | %16.5 | | | </details> <details><summary>16.7 (1)</summary> | :checkered_flag: | Iteration | References | Status / ETA | DRI | Impact | | --- | --- | --- | --- | --- | --- | | :x: | gitlab-org/gitlab#415855+ | gitlab-org/gitlab#415855 | %16.7 | | | </details> <details><summary>No milestone (4)</summary> | :checkered_flag: | Iteration | References | Status / ETA | DRI | Impact | | --- | --- | --- | --- | --- | --- | | :hourglass_flowing_sand: | gitlab-org/gitlab#429255+ | gitlab-org/gitlab#429255 | | | | | :hourglass_flowing_sand: | gitlab-org/gitlab#429254+ | gitlab-org/gitlab#429254 | | | | | :x: | gitlab-org/gitlab#427170+ | gitlab-org/gitlab#427170 | | | | | :x: | gitlab-org/gitlab#426280+ | gitlab-org/gitlab#426280 | | | | </details> <details><summary>Backlog (4)</summary> | :checkered_flag: | Iteration | References | Status / ETA | DRI | Impact | | --- | --- | --- | --- | --- | --- | | :x: | gitlab-org/gitlab#415920+ | gitlab-org/gitlab#415920 | %Backlog | | | | :x: | gitlab-org/gitlab#415417+ | gitlab-org/gitlab#415417 | %Backlog | | | | :x: | gitlab-org/gitlab#415415+ | gitlab-org/gitlab#415415 | %Backlog | | | | :x: | gitlab-org/gitlab#408212+ | gitlab-org/gitlab#408212 | %Backlog | | | </details> </details> <small> Use epic gitlab-org&10728 on issues to add them to this list.<br> Generated at <code>2024-07-08 15:02:38 UTC</code> by https://gitlab.com/gitlab-org/ci-cd/ops-eng-managers/headway/-/jobs/7288144182. </small> <!-- GENERATED BY HEADWAY PROGRESS REPORT DO NOT EDIT -->
epic