A -1 line location gets reported from gitleaks if the line number cannot be correctly extracted. Gitleaks attempts to infer the line number from a diff. Here is the function that attempts this. The logic gets a big complicated as multiple leaks with the sample content can be in the same diff. If the line extraction function cannot determine the line number, -1 is returned.
In hindsight, this probably should have been released as an experimental feature in the gitleaks project.
I spent a little bit of time in the debugger and figured out what was going on, so thank you for raising this as it uncovered a gitleaks bug. Gitleaks allows for multi-lined regular expressions as seen in this code. However, if a leak is multi-lined then gitleaks will fail at attempting to extract the line number of the leak since the leak is multi-lined. So for this case:
matches the regular expression for Password in URL. However, when gitleaks attempts to extract the line, it will do so by iterating the content (file, patch, byte array) line by line meaning it will never match the leak since the leak is composed on mutliple lines, in this case two lines.
An easy fix would be change Gitleaks to only support single-line secrets. The line extraction code in Gitleaks assumes a secret is on a single line, not multi-lined like the FP in the issue description. So there is a discrepancy with what is detected and how the line extraction works. If I'm being honest, I had assumed Gitleaks expected a secret to be contained within a single line .
Alternatively, we could add some code into the line extraction functions that parses multi-line secrets. This is a rather hefty change and would require a major version bump since we would be changing the report format to include a startLine and endLine in support of multi-lined secrets. I already spent some time trying to implement this but it's not an easy change and I'm still trying to come up with a decent solution.
Lastly, we could add some logic to the secret detection analyzer that appends a little message on the vulnerability stating "we were unable to determine the line number" or something along those lines if the vulnerability line number is -1.
Because I just thought of this - the most obvious multi-line use case we'll encounter with secret detection occurs when private keys (RSA, etc.) are checked into repos, whether as separate files or in-lined in source code.
I appreciate this use case is most likely a big lift, but it's one to call out as something to be aware of.
Ideally if a user see's a private key secret pop up they should recognize that they need to remove the entire key, not just the BEGIN ... header.
My instinct is telling me we should move forward with item 3 regardless and I can complete item 1 in my free time.
Another benefit of item 1 is supporting single line secrets would reduce FPs like the one above.
Edit:
Ideally if a user see's a private key secret pop up they should recognize that they need to remove the entire key, not just the BEGIN ... header.
On second thought, maybe I shouldn't be making assumptions. Including multi-line secret line extraction is a big lift like @twoodham said, but it would be beneficial and an improvement to Gitleaks and our secret detection analyzer.
Path forward looks good to me but figured I'd leave some comments here anyway.
Lastly, we could add some logic to the secret detection analyzer that appends a little message on the vulnerability stating "we were unable to determine the line number" or something along those lines if the vulnerability line number is -1.
Adding some kind of confidence metadata would be quite useful but TBH it probably has the same utility as something like this below if we need a quick fix. If we can be confident about the file, let's just default to 1?
// Location returns a structured location.funcLocation(filestring,lineStart,lineEndint,commitissue.Commit)issue.Location{if(lineStart==-1){lineStart=1}returnissue.Location{File:file,LineStart:lineStart,LineEnd:lineEnd,Commit:&commit,}}
this won't work well for Category:SAST or anything that relies on tracking changes, but leaked secrets seem like a good candidate for defaulting to line 1
Aside from that, we only need to scoping accurately for source code extraction, which shouldn't currently work anyway for multiline secrets. IMO I think changing secrets to only supporting single-line secrets would be acceptable