@@ -224,9 +224,7 @@ types: Git text blobs (corresponding to push events) and arbitrary text blobs. I
we focus entirely on Git text blobs.
The detection flow for push events relies on subscribing to the PreReceive hook
to scan commit data using the [PushCheck interface](https://gitlab.com/gitlab-org/gitlab/blob/3f1653f5706cd0e7bbd60ed7155010c0a32c681d/lib/gitlab/checks/push_check.rb). This `SecretScanningService`
service fetches the specified blob contents from Gitaly, scans
the commit contents, and rejects the push when a secret is detected.
to scan commit data using the [PushCheck interface](https://gitlab.com/gitlab-org/gitlab/blob/3f1653f5706cd0e7bbd60ed7155010c0a32c681d/lib/gitlab/checks/push_check.rb). The scan operates on diff patches (via `DiffBlobs`) rather than full blob contents, reducing the scan surface to only changed lines within each commit. The push is rejected when a secret is detected.
See [Push event detection flow](#push-event-detection-flow) for sequence.
In the case of a push detection, the commit is rejected inline and error returned to the end user.
For the Phase1, we use the private [Secret Detection Ruby Gem](https://gitlab.com/gitlab-org/gitlab/-/tree/5dfcf7431bfff25519c05a7e66c0cbb8d7b362be/gems/gitlab-secret_detection) that is invoked by the [Secrets Push Check](https://gitlab.com/gitlab-org/gitlab/-/blob/5dfcf7431bfff25519c05a7e66c0cbb8d7b362be/ee/lib/gitlab/checks/secrets_check.rb) on the GitLab Rails platform.
The private SD gem offers the following support in addition to running scan on multiple blobs:
The private SD gem offers the following support in addition to running scans on multiple payloads:
- Configurable Timeout on the entire scan-level and on each blob level.
@@ -334,10 +334,10 @@ sequenceDiagram
alt On access check failure
Rails-->>Gitaly: Scanning Skipped
end
Rails->>Gitaly: Fetch blobs
Gitaly->>Rails: Quarantined Blobs
Rails->>Secret Detection Service: Invoke scan by embedding blobs
@@ -27,3 +27,13 @@ A Gitaly server-side hook would have performance benefits around minimal transfe
The Ruby Push Check approach follows a clear execution plan to achieve delivery by anticipated timeline and is more closely aligned with the long-term direction of platform-wide scanning. For example, future scanning of issuables will require execution within the trust boundary of the Rails application rather than Gitaly context. This approach, however, has raised concerns around elevated memory usage within the Rails application leading to availability concerns. This direction may also require migrating towards Gitaly's new plugin architecture in the future once the timeline is known.
A standalone service may be considered in the future but requires considerations of a technical approach that should be better informed by data gathered during [pre-production profiling](https://gitlab.com/gitlab-org/gitlab/-/issues/428499).
## Implementation Evolution
The implementation evolved to use diff-based scanning rather than full blob scanning. Instead of fetching entire blob contents, the system now:
1. Calls [`ListAllCommits`](https://gitlab-org.gitlab.io/gitaly/#gitaly.ListAllCommitsRequest) to get new commits
2. Calls [`FindChangedPaths`](https://gitlab-org.gitlab.io/gitaly/#gitaly.FindChangedPathsRequest) to get changed file paths with their associated commit SHAs
3. Calls [`DiffBlobs`](https://gitlab-org.gitlab.io/gitaly/#gitaly.DiffBlobsRequest) to get only the diff patches for changed files
This approach reduces the scan surface to only changed lines within each commit, improving performance and reducing memory consumption.