Commit 90977e25 authored by Ahmed Hemdan's avatar Ahmed Hemdan 1️⃣
Browse files

Update Secret Detection ADR to reflect diff-based scanning

parent 7588b30d
Loading
Loading
Loading
Loading
+20 −20
Original line number Diff line number Diff line
@@ -224,9 +224,7 @@ types: Git text blobs (corresponding to push events) and arbitrary text blobs. I
we focus entirely on Git text blobs.

The detection flow for push events relies on subscribing to the PreReceive hook
to scan commit data using the [PushCheck interface](https://gitlab.com/gitlab-org/gitlab/blob/3f1653f5706cd0e7bbd60ed7155010c0a32c681d/lib/gitlab/checks/push_check.rb). This `SecretScanningService`
service fetches the specified blob contents from Gitaly, scans
the commit contents, and rejects the push when a secret is detected.
to scan commit data using the [PushCheck interface](https://gitlab.com/gitlab-org/gitlab/blob/3f1653f5706cd0e7bbd60ed7155010c0a32c681d/lib/gitlab/checks/push_check.rb). The scan operates on diff patches (via `DiffBlobs`) rather than full blob contents, reducing the scan surface to only changed lines within each commit. The push is rejected when a secret is detected.
See [Push event detection flow](#push-event-detection-flow) for sequence.

In the case of a push detection, the commit is rejected inline and error returned to the end user.
@@ -278,31 +276,33 @@ sequenceDiagram
    User->>+Workhorse: git push with-secret
    Workhorse->>+Gitaly: tcp
    Gitaly->>+Rails: PreReceive
    Rails->>-Gitaly: ListAllBlobs
    Gitaly->>-Rails: ListAllBlobsResponse

    Rails->>+GitLabSecretDetection: Scan(blob)
    Rails->>+Gitaly: ListAllCommits
    Gitaly->>-Rails: commits
    Rails->>+Gitaly: FindChangedPaths
    Gitaly->>-Rails: changed paths with commit SHAs
    Rails->>+Gitaly: DiffBlobs
    Gitaly->>-Rails: diff patches

    Rails->>+GitLabSecretDetection: Scan(payloads)
    GitLabSecretDetection->>-Rails: found

    Rails->>User: rejected: secret found

    User->>+Workhorse: git push without-secret
    Workhorse->>+Gitaly: tcp
    Gitaly->>+Rails: PreReceive
    Rails->>-Gitaly: ListAllBlobs
    Gitaly->>-Rails: ListAllBlobsResponse
    User->>+Workhorse: POST issuable with-secret
    Workhorse->>+Rails: tcp
    Rails->>+GitLabSecretDetection: PreReceive

    Rails->>+GitLabSecretDetection: Scan(blob)
    GitLabSecretDetection->>-Rails: not_found
    GitLabSecretDetection->>GitLabSecretDetection: Scan(payloads)
    GitLabSecretDetection->>-Rails: found

    Rails->>User: accepted
    Rails->>User: rejected: secret found
```

#### Gem Scanning Interface

For the Phase1, we use the private [Secret Detection Ruby Gem](https://gitlab.com/gitlab-org/gitlab/-/tree/5dfcf7431bfff25519c05a7e66c0cbb8d7b362be/gems/gitlab-secret_detection) that is invoked by the [Secrets Push Check](https://gitlab.com/gitlab-org/gitlab/-/blob/5dfcf7431bfff25519c05a7e66c0cbb8d7b362be/ee/lib/gitlab/checks/secrets_check.rb) on the GitLab Rails platform.

The private SD gem offers the following support in addition to running scan on multiple blobs:
The private SD gem offers the following support in addition to running scans on multiple payloads:

- Configurable Timeout on the entire scan-level and on each blob level.

@@ -334,10 +334,10 @@ sequenceDiagram
    alt On access check failure
        Rails-->>Gitaly: Scanning Skipped
    end
    Rails->>Gitaly: Fetch blobs
    Gitaly->>Rails: Quarantined Blobs
    Rails->>Secret Detection Service: Invoke scan by embedding blobs
    Secret Detection Service->>Secret Detection Service: Runs Secret Detection on input blobs
    Rails->>Gitaly: Fetch diff patches
    Gitaly->>Rails: Diff patches
    Rails->>Secret Detection Service: Invoke scan by embedding diffs
    Secret Detection Service->>Secret Detection Service: Runs Secret Detection on input diffs
    Secret Detection Service->>Rails: Result
    Rails->>Gitaly: Result
```
+10 −0
Original line number Diff line number Diff line
@@ -27,3 +27,13 @@ A Gitaly server-side hook would have performance benefits around minimal transfe
The Ruby Push Check approach follows a clear execution plan to achieve delivery by anticipated timeline and is more closely aligned with the long-term direction of platform-wide scanning. For example, future scanning of issuables will require execution within the trust boundary of the Rails application rather than Gitaly context. This approach, however, has raised concerns around elevated memory usage within the Rails application leading to availability concerns. This direction may also require migrating towards Gitaly's new plugin architecture in the future once the timeline is known.

A standalone service may be considered in the future but requires considerations of a technical approach that should be better informed by data gathered during [pre-production profiling](https://gitlab.com/gitlab-org/gitlab/-/issues/428499).

## Implementation Evolution

The implementation evolved to use diff-based scanning rather than full blob scanning. Instead of fetching entire blob contents, the system now:

1. Calls [`ListAllCommits`](https://gitlab-org.gitlab.io/gitaly/#gitaly.ListAllCommitsRequest) to get new commits
2. Calls [`FindChangedPaths`](https://gitlab-org.gitlab.io/gitaly/#gitaly.FindChangedPathsRequest) to get changed file paths with their associated commit SHAs
3. Calls [`DiffBlobs`](https://gitlab-org.gitlab.io/gitaly/#gitaly.DiffBlobsRequest) to get only the diff patches for changed files

This approach reduces the scan surface to only changed lines within each commit, improving performance and reducing memory consumption.