Secret Detection: New Branch Pipeline Scanning Limitation - Implement Branch Base SHA Detection
## Problem Statement When a **new branch** is pushed for the first time via a **branch pipeline** (not an MR pipeline), Secret Detection only scans the **HEAD commit** instead of all commits since the branch diverged from its parent branch. ### Current Behavior ``` New branch with 70 commits pushed → Only 1 commit scanned (HEAD only) ``` ### Expected Behavior ``` New branch with 70 commits pushed → All 70 commits scanned (from branch point) ``` --- ## Root Cause Analysis When a new branch is pushed, GitLab sets these CI variables: | Variable | Value | |----------|-------| | `CI_COMMIT_BEFORE_SHA` | `0000000000000000000000000000000000000000` (null SHA) | | `CI_COMMIT_SHA` | HEAD of branch | | `CI_MERGE_REQUEST_DIFF_BASE_SHA` | **Not available** (only in MR pipelines) | The analyzer detects this as a "new branch" via: ```go func (opts *FetchOptions) IsNewBranch() bool { return opts.CommitBeforeSHA == DefaultCommitBeforeSHA // "0000...000" } ``` This triggers `FetchShallow` strategy, which scans only `HEAD^..HEAD` (1 commit). ### The Core Problem **Branch pipelines lack branch divergence context.** There's no predefined variable that tells us where the branch diverged from the default branch. --- ## Proposed Solutions ### Solution 1: Compute merge-base via GitLab API (Short-term) **Approach:** Call GitLab's existing `/repository/merge_base` API endpoint from the analyzer. **Pros:** - No GitLab core changes needed - Uses existing, proven API endpoint - `CI_JOB_TOKEN` has repository read access by default - Gitaly computes this efficiently server-side **Cons:** - Requires network call to GitLab API - Adds external dependency to analyzer --- ### Solution 2: Expose via Gitaly gRPC (Alternative) **Approach:** Use Gitaly's `FindMergeBase` RPC directly instead of HTTP API. **Implementation:** Similar to how GitLab core uses it in `lib/gitlab/gitaly_client/repository_service.rb` **Pros:** - More efficient than HTTP API - Direct gRPC communication **Cons:** - Requires Gitaly client library in analyzer - More complex implementation - Still requires network call --- ### Solution 3: New Predefined CI Variable (Long-term, Recommended) **Approach:** Add `CI_COMMIT_BRANCH_BASE_SHA` predefined variable computed by GitLab during pipeline creation. **Implementation in GitLab:** ```ruby # In Ci::Pipeline or related service def compute_branch_base_sha return nil unless project.repository.branch_exists?(ref) project.repository.merge_base(sha, project.default_branch) end ``` **Usage in Analyzer:** ```go baseSHA := os.Getenv("CI_COMMIT_BRANCH_BASE_SHA") if baseSHA == "" { // Fallback to API call or current behavior } ``` **Pros:** - ~Zero analyzer changes needed~ Minimal analyzer changes needed - Works for all security scanners (SAST, Dependency Scanning, etc.) - No additional network calls in CI jobs - Consistent with existing CI variable patterns - Computed once during pipeline creation **Cons:** - Requires GitLab core change - May have performance implications for pipeline creation - Needs RFC/proposal process ## Related Documentation - [GitLab Repository API - merge_base](https://docs.gitlab.com/ee/api/repositories.html#get-merge-base) - [GitLab CI/CD Predefined Variables](https://docs.gitlab.com/ee/ci/variables/predefined_variables.html) - [Pipeline Secret Detection Coverage](https://docs.gitlab.com/ee/user/application_security/secret_detection/pipeline/) - [Gitaly Repository Service - FindMergeBase](https://gitlab.com/gitlab-org/gitaly/-/blob/master/proto/repository-service.proto)
issue