Secret Detection: New Branch Pipeline Scanning Limitation - Implement Branch Base SHA Detection
## Problem Statement
When a **new branch** is pushed for the first time via a **branch pipeline** (not an MR pipeline), Secret Detection only scans the **HEAD commit** instead of all commits since the branch diverged from its parent branch.
### Current Behavior
```
New branch with 70 commits pushed → Only 1 commit scanned (HEAD only)
```
### Expected Behavior
```
New branch with 70 commits pushed → All 70 commits scanned (from branch point)
```
---
## Root Cause Analysis
When a new branch is pushed, GitLab sets these CI variables:
| Variable | Value |
|----------|-------|
| `CI_COMMIT_BEFORE_SHA` | `0000000000000000000000000000000000000000` (null SHA) |
| `CI_COMMIT_SHA` | HEAD of branch |
| `CI_MERGE_REQUEST_DIFF_BASE_SHA` | **Not available** (only in MR pipelines) |
The analyzer detects this as a "new branch" via:
```go
func (opts *FetchOptions) IsNewBranch() bool {
return opts.CommitBeforeSHA == DefaultCommitBeforeSHA // "0000...000"
}
```
This triggers `FetchShallow` strategy, which scans only `HEAD^..HEAD` (1 commit).
### The Core Problem
**Branch pipelines lack branch divergence context.** There's no predefined variable that tells us where the branch diverged from the default branch.
---
## Proposed Solutions
### Solution 1: Compute merge-base via GitLab API (Short-term)
**Approach:** Call GitLab's existing `/repository/merge_base` API endpoint from the analyzer.
**Pros:**
- No GitLab core changes needed
- Uses existing, proven API endpoint
- `CI_JOB_TOKEN` has repository read access by default
- Gitaly computes this efficiently server-side
**Cons:**
- Requires network call to GitLab API
- Adds external dependency to analyzer
---
### Solution 2: Expose via Gitaly gRPC (Alternative)
**Approach:** Use Gitaly's `FindMergeBase` RPC directly instead of HTTP API.
**Implementation:** Similar to how GitLab core uses it in `lib/gitlab/gitaly_client/repository_service.rb`
**Pros:**
- More efficient than HTTP API
- Direct gRPC communication
**Cons:**
- Requires Gitaly client library in analyzer
- More complex implementation
- Still requires network call
---
### Solution 3: New Predefined CI Variable (Long-term, Recommended)
**Approach:** Add `CI_COMMIT_BRANCH_BASE_SHA` predefined variable computed by GitLab during pipeline creation.
**Implementation in GitLab:**
```ruby
# In Ci::Pipeline or related service
def compute_branch_base_sha
return nil unless project.repository.branch_exists?(ref)
project.repository.merge_base(sha, project.default_branch)
end
```
**Usage in Analyzer:**
```go
baseSHA := os.Getenv("CI_COMMIT_BRANCH_BASE_SHA")
if baseSHA == "" {
// Fallback to API call or current behavior
}
```
**Pros:**
- ~Zero analyzer changes needed~ Minimal analyzer changes needed
- Works for all security scanners (SAST, Dependency Scanning, etc.)
- No additional network calls in CI jobs
- Consistent with existing CI variable patterns
- Computed once during pipeline creation
**Cons:**
- Requires GitLab core change
- May have performance implications for pipeline creation
- Needs RFC/proposal process
## Related Documentation
- [GitLab Repository API - merge_base](https://docs.gitlab.com/ee/api/repositories.html#get-merge-base)
- [GitLab CI/CD Predefined Variables](https://docs.gitlab.com/ee/ci/variables/predefined_variables.html)
- [Pipeline Secret Detection Coverage](https://docs.gitlab.com/ee/user/application_security/secret_detection/pipeline/)
- [Gitaly Repository Service - FindMergeBase](https://gitlab.com/gitlab-org/gitaly/-/blob/master/proto/repository-service.proto)
issue