@josephburnett from our 1:1 discussion, I think all published steps should be signed and governance policies should enforce the presence of those signatures (and related attestations).
My understanding is that we want to enable folks on self-managed to reference steps published on gitlab.com (and even vice versa). In order to do that without compromising supply chain security, folks must be able to verify that the published step artifact(s) were actually built from the source repository they appear to be from.
We can use Sigstore for keyless signing and verification, and this can mostly happen transparently to the end-user. See docs: https://docs.gitlab.com/ee/ci/yaml/signing_examples.html
Marshall Cottrell (aefcf0af) at 27 Mar 08:47
Marshall Cottrell (093ba9fb) at 26 Mar 18:24
Merge branch 'ttnguyen28-main-patch-c28b' into 'main'
... and 1 more commit
💡 Provide a detailed answer to the question on why this change is being proposed, in accordance with our value of Transparency.
CHANGEME - Updated with link to the GitLab Handbook Traffic chart in Tableau to replace the decommissioned Sisense link.
Please verify the check list and ensure to tick them off before the MR is merged.
Maintained by
section on the page being editedAgreed I don't think it matters for our case.
Related to gitlab-org/incubation-engineering/ai-assist&3 and #414987.
We would like to be able to correlate AI feature usage analytics with other metrics already available in VSA dashboards. This is not possible to do right now because Code Suggestion telemetry event data is not stored in the application.
We have a PoC for shipping CS event data to Product Analytics. You can see the dashboard here: https://gitlab.com/gitlab-org/gitlab/-/analytics/dashboards/ai_usage
The goal of this MR is to begin sending data from the events
table in PG to the Product Analytics backend. This will allow us to begin surfacing and correlating much of the data that is currently in VSA dashboards.
This is only intended as a PoC, but may lay groundwork necessary for broader consolidation of VSA <-> PA. See &8925
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
Before | After |
---|---|
Numbered steps to set up and validate the change are strongly suggested.
Marshall Cottrell (ecb990fb) at 21 Mar 16:22
Merge branch 'fforster/hugolint' into 'main'
... and 3 more commits
Reduce the number of broken / dangling links in the handbook by informing users about breakage at MR review time.
I noticed that there is a high number of broken links in the handbook. Many of them are due to pages getting moved around, either within the handbook or to the internal-only handbook.
By providing MR authors with a automated feedback, we can significantly improve the health of the handbook.
I have written a small utility called "hugolint". At the moment its only function is to perform a static link analysis, i.e. it only checks links within the handbook; external links are ignored. Because the static analysis is local, it is very quick and can check the entire handbook in about two seconds.
The tools emits a "code quality" artefact (example result). These artefacts are integrated nicely with the MR view, showing problems newly introduced by the MR. For example, authors of new content get quick feedback that a link they're putting into a page doesn't exist. When moving a page, the code quality report will highlight all pages that currently link to the old location.
For more information on the "Code Quality" artefacts, see https://docs.gitlab.com/ee/ci/testing/code_quality.html#view-code-quality-results
This is a first iteration that has some known limitations:
content/
directory is assumed to be the web root. This is not universally true, for example for assets in the static/
directory.handbook.gitlab.com
content is in this repository, e.g. https://handbook.gitlab.com/docs/
. Such links are currently false positives.Please verify the check list and ensure to tick them off before the MR is merged.
Maintained by
section on the page being editedhugolint
job.This is awesome, thanks @fforster!
Reduce the number of broken / dangling links in the handbook by informing users about breakage at MR review time.
I noticed that there is a high number of broken links in the handbook. Many of them are due to pages getting moved around, either within the handbook or to the internal-only handbook.
By providing MR authors with a automated feedback, we can significantly improve the health of the handbook.
I have written a small utility called "hugolint". At the moment its only function is to perform a static link analysis, i.e. it only checks links within the handbook; external links are ignored. Because the static analysis is local, it is very quick and can check the entire handbook in about two seconds.
The tools emits a "code quality" artefact (example result). These artefacts are integrated nicely with the MR view, showing problems newly introduced by the MR. For example, authors of new content get quick feedback that a link they're putting into a page doesn't exist. When moving a page, the code quality report will highlight all pages that currently link to the old location.
For more information on the "Code Quality" artefacts, see https://docs.gitlab.com/ee/ci/testing/code_quality.html#view-code-quality-results
This is a first iteration that has some known limitations:
content/
directory is assumed to be the web root. This is not universally true, for example for assets in the static/
directory.handbook.gitlab.com
content is in this repository, e.g. https://handbook.gitlab.com/docs/
. Such links are currently false positives.Please verify the check list and ensure to tick them off before the MR is merged.
Maintained by
section on the page being editedhugolint
job.Marshall Cottrell (ceee6c12) at 08 Mar 17:18
Add tests for guest and anonymous users
For Xray reports specifically, the flow seems to be:
StoreRepositoryXrayService
is invoked, downloads the artifact, parses the JSON and inserts a Projects::XrayReport
record into the databaseProjects::XrayReport
record is saved, the Elastic::XrayReportSearch
concern is invokedElastic::XrayReportSearch
serializes the record(s) into ES documents and pushes them onto a Redis queueThis is a ton of boilerplate, especially in the context of ingesting unstructured/semi-structured data. If the goal of Allow Elasticsearch framework to index any data (#442197) is to provide a general purpose solution I wonder if we can simplify to something along the lines of:
gs://elasticsearch-upload-ingest/<timestamp>-<project_id>-<artifact_name>.json
)gs://elasticsearch-upload-ingest
Even if we wish to avoid using Filebeat or other ES features, I think we could come up with a more cloud native solution here. For example, we could continue to use a Redis queue but with pointers to keys in object storage of files that need to be indexed.
Marshall Cottrell (da568bbc) at 07 Mar 20:24
Add tests for guest and anonymous users
Audit Events Strategy (&1985) would be another potentially interesting data source to start capturing. Giving admins the ability to ask Duo chat questions related to audit events/logs could lead to very powerful security features:
@grzesiek has any progress been made on GitLab Events (&8349)? It would be awesome if we could fully decouple the ES ingest pipeline from the application at some point.
It looks like are still using Redis for batch processing of the records. It is done this way so we can make consistency guarantees by processing the updated records in the order they are modified. A more general-purpose event stream that can make the same guarantees would be a good fit here.
/cc @joshlambert
Along the same lines, I would love to see ES index mapping configurations decoupled from Rails as well. This is another area where the development loop is unnecessarily convoluted. It would be great if we had subcommands like gitlab-elasticsearch-indexer mappings <type>
to generate the index mappings/config. The current approach of generating these from ActiveRecord
models doesn't provide much value. It's very often easier to author the mapping configs as plain JSON and ES provides all sorts of native templating and dynamic mapping functionality that we could be leveraging instead.
@maddievn thanks, that's very helpful. Are we still planning to refactor/decouple this from Rails in the future? I was thinking it would be nice if we could implement things like text extraction and document splitting in gitlab-elasticsearch-indexer
.
Ai Agents -- Experimental (&12330) is a very open-ended feature, folks can build AI Agents to power any use case. Eventually we would like for users to be able to populate the RAG storage backend with (more or less) arbitrary content.
For example, the customer might wish to index all of their leads in Salesforce or a bunch of spreadsheets from Google Drive. I was thinking we might power such use cases by exposing the relevant tooling in CI jobs (ideally the same tooling we use to index arbitrary content on the backend). This might look like:
gitlab-elasticsearch-indexer
binary/subcommands are used within CI job to perform fulltext extraction, chunking, and ultimately formatting the data into documents that can be ingested by ESgitlab-elasticsearch-indexer
binary is used once again on the backend to index the documentsA side-goal here would be that developers/contributors can run gitlab-elasticsearch-indexer
to populate ES indices locally. This would really help close the development loop on related features because you can avoid running the entire application just to suck data into an ES instance for testing AI features.
/cc @DylanGriffith
Adding AI Gateway as a distributable component is a major undertaking and will need coordinated efforts from Infra, Delivery and Quality. As a comparison, when we added the simpler gitlab-sshd it was a major undertaking.
I think gitlab-sshd
might actually be more complex. AI Gateway is entirely stateless at the moment. Also worth noting that most of the complexity in shipping AI Gateway support to SM customers has been related to authentication. Not saying we necessarily unwind any of that, but BYOK for LLM providers is a much simpler solution for SM overall.
Most importantly, AI Gateway is an optional part of the stack and not as load-bearing for core functionality as something like gitlab-sshd
.
As an independent effort, adding keys as an integration allows customers to manage their keys on both self-managed and .com within the GitLab application, and surface them out of environment variables.
@shekharpatnaik can you post the architecture diagrams we drafted in the call today? I think those diagrams illustrate clearly that AI Gateway interactions make this more complex than simply "fetch API key from DB and forward with request". Doing API key management for LLM providers in Rails doesn't make as much sense if all requests are forwarded through AI Gatway (separate service).