Users may unintentionally commit sensitive information to their repositories. This can include passwords, credentials for cloud providers, or SSH private keys.
If repositories are publicly accessible, or even if they are private but shared within many users, the secrets can be leaked to people that should not have access to those credentials.
Once on the repo, it is too late to block someone to find them. Git commits cannot be deleted.
But GitLab can notify users that secrets have been compromised, so at least they can revoke/change/suspend the leaked credentials and mitigate the impact.
Create a new analyzer for the SAST tool to provide secret detection.
This analyzer will be executed every time, to it reports true to compatibility checks with the repository. As part of SAST, it will be part of the sast job definition, and Auto DevOps pipelines.
The vulnerability text should be clear enough to help users in fixing the problem.
The tool should provide scanning for the last version of the repo (or last commit) to save time on big repos.
We can evaluate different existing open source tools:
@bikebilly Following my meeting with @kathyw, this is definitely something we need to have soon. It would have prevented some issues with our last data leak at GitLab.
This would be great to have. Recently, a customer reported that their cloud-provider keys were compromised, and they were storing the keys in their GitLab repo. Scanning for the presence of these keys and warning on that would be optimal, as it isn't good security practice to store keys in repos.
Would be best implemented at time of push, rather than scan/MR.
In heavy compliance orgs, once something is committed it's technical available to the whole dev org and we don't have strict tracing/auditing of who has read files in the repo (only clones/check-outs).
Is there any way to block a push on the server side based on whether or not a client-side commit/pre-push hooks were run? For example, requiring clean output from something like gitleaks to be appended to the commit messages. Something that wouldn't necessarily be unspoofable, but would be more trouble than adding entries to any ignore file supported by the tool the org chooses for themselves.
Let’s iterate on this. MVP is to notify the users when credentials have been exposed. It’s too late, but at least the user can take action. Security is always a trade off, let’s try to not target the perfect solution right now.
@plafoucriere agree that MVP is first step, and we can start with detect and alert after the fact. Further down the road, we should think about offering customers the option to either have this be a blocking capability (blocking a push, as @asaba suggested), or alerting after the fact, depending on their tolerance for risk vs. operational continuity. Certain customers have an extremely low tolerance for risk, and will want to block a push, even if it means 'slowing down the process'.
I understand but these tools are just sending signals. There’s no way to predict with certitude their results. Therefore, they must never block the work process.
Imagine a false positive preventing the user from commmiting: even being able to dismiss the vulnerability, the user is blocked.
What we can do maybe later is to filter the impacted parts of code from searches and rendering (unless the user dismiss it). A clone of the repo will still expose the data, but it’s a leap forward.
Also, we’re refactoring sast in %10.8, to have one docker image per tool. So keep in mind that one can setup a fit push pre hook (client side) to run the image related to this issue before actually pushing to gitlab.
To add one more data point. I tried gitleaks on the gitlab-ce repo using the "--local" and "--since" flags and it seemed to run a bit better:
sticky:gitlab-ce mentat$ time gitleaks --local --since=9e78b86dd48ee893c950643d6ca0188a4978eeaa .2018/05/02 09:04:22 fetching . from . ...2018/05/02 09:04:29 No Leaks detected for .real 0m7.125suser 0m6.003ssys 0m2.323s
The commit I used was about 10 commits (squashed merges) behind master, so it was a fair amount of data and probably more than you'd normally see in an MR.
We've been looking at building something as a combination of a number of these tools using the oclif framework as a set of injectable plugins with configurable log/warn/error type features.
We're also looking at including some whitelist functionality for where we would allow credentials to be committed - for example in a test suite.
That way we can provide both a pre push and MR/CI assessment, as well as expand for other patterns we (UK Department for Work and Pensions) are particularly interested in, for example PII.
We don't need to check all the history each time. Since this job will be run for each commit, it will be enough to just test the current version of the repo (ie: the current commit).
We will add a note for users to make sure they run the tool at least once on the whole repo history.
The file is huge and contains (apparently) only false positives.
There's no way to ignore dirs, but it should be trivial to fix that and propose a pull request.
Also, there's a new smaller project available from the same author: https://github.com/zricethezav/gitleaks-ci
This one will fetch the content of the commit directly on github, and look for sensitive data with a few RegExps. It's not suitable for our need.
Just a note, gitleaks has had some major performance gains in the past couple months: https://github.com/zricethezav/gitleaks. I've been much more active on maintaining this project.
Issue was some dangling goroutines...
Also, there is a PR for gitlab org/user support I will be merging soon. Be on the lookout for that.
@bikebilly This has been requested many times by customers, what do you think about planning it before %12.0?
It shouldn't take long, and integration risk is limited.
@plafoucriere I'm wondering if this can be in the radar for ~Manage team instead, since it is more related to the security of the repository (even if probably both areas are related).
I'm wondering if this can be in the radar for ~Manage team instead, since it is more related to the security of the repository (even if probably both areas are related).
@bikebilly I disagree! We ~Secure team aim at securing all stages, right?
@bikebilly@fcatteau - I admit I'm not 100% clear on where the breakdown is between the security team and things like security testing in CI. I've pinged @markpundsack for some clarification in &487 (closed), the WIP category epic for secrets detection. Unless I've misunderstood something, though, I think that since this already is explicitly a category for ~"devops:verify" that would should move this to that stage instead.
@bikebilly I disagree! We ~Secure team aim at securing all stages, right?
@fcatteau yes, but we primarily aim to secure the end user application, not the GitLab product itself (that is ~devops:manage), for example 2FA. Also, ~"devops:configure" addresses some of the security aspects if they are related to Kubernetes configuration.
Anyway, as said in my previous comment it is a blurry line and we are obviously as ~Secure team we are willing to contribute, but the correct stage should be defined independently (stages and teams are not always an "alias"). Having multiple stages may create confusion to users.
At the moment this is not consistent because the product category is marked as ~"devops:verify", so whatever is the choice it should be reflected to both (category and issues).
But we're talking about looking for credentials possibly stored in the repo, right?
Yes, and those files could not be part of the final application (a classic use case are credentials to access a 3rd party service for deployment).
I could be fine taking it into ~"devops:secure" scope, but we should agree it with the ~Verify team since we are going to "steal" one of their categories.
We can discuss it in our meeting and then report here!
We should also confirm with @markpundsack whether the security scanning/testing category is really part of secure or verify. I wasn't involved in the discussions when it was added to verify, and there may be good reasons why that was the case.
Was this moved? I thought binary authorization was still release, and this was still verify. EDIT: wait, no, I do remember this one at least going back again to ~"devops:secure". @bikebilly I will leave it to you to move the category to the right place.
Was this moved? I thought binary authorization was still release, and this was still verify. EDIT: wait, no, I do remember this one at least going back again to ~"devops:secure".
@jlenny no, not really. It was proposed for that change as part of the Runner category change (we needed to free up one slot in ~"devops:verify"), but the proposal was rejected and the MR closed (see gitlab-com/www-gitlab-com!16912 (closed)).
So the current status is that Secret Detection is still a category of ~"devops:verify", and as far as I know there is no technical reason to move it.
I'm fine to move secret detection as a feature of ~sast, and make it unrelated to the Runner changes. But since this proposal has different context than the original one, I think we should socialize about it first.
Moving this issue to ~"devops:verify" to match the categories.yml then, for now at least. I thought that per https://gitlab.com/gitlab-org/gitlab-ee/issues/6719#note_116796683 this had officially been moved but that must have been a prospective agreement. @bikebilly if you're taking these going forward, please update (or close?) the epic and relabel this issue when that happens.
Why interested: Their security division is concerned about users committing secrets into GitLab.
Current solution for this problem: Push rules are insufficient for this as these secrets might be in different filenames or within files that are whitelisted.
Ideally it would be part of our Push Rules or push-time checking, but if its going to be a "security scan", then it makes sense to be a part of our security tools that generates a report and offers remediation options.
Just jumping in, push-time checking seems like the much more appropriate place to do this (prevent the problem instead of reporting that it happened later.)
@bikebilly this customer has offered to get on a call to discuss insights and requirements. Would that be necessary/helpful for you? Happy to set it up/facilitate from support's side.
From the ticket:
In the security team, we are currently working on an important project to prevent that secrets are leaked by developers in our Gitlab repos.We know that Gitlab already offers push rules, but this is insufficient for us, as these only check certain pre-known files. But secrets can be present in many other locations.I saw the following open tickets, which are exactly discussing what we need: https://gitlab.com/gitlab-org/gitlab-ee/issues/6719 + https://gitlab.com/gitlab-org/gitlab-ee/issues/8792), but I don't see a lot of activity (ticket has been open for 9 months now).Can you give us an update about these tickets and the possible ETA? I'm happy to help out and discuss any insights or requirements we might have.We have chosen the Ultimate license, especially for security features, so it's important for us that features such as this are getting implemented.
@bikebilly What about targeting %12.0 or %12.1 for this issue? It's quite easy to implement and would require just a few days to cover this need from many customers.
@plafoucriere this issue is still out of the ~Secure scope, even if I'm seeing much confusion about that and many different proposals to move it.
Let's clarify where this feature/category will be positioned, and then we can properly prioritize.
Until then, I think we should focus on the current scope for ~Secure first, and then move on other topics. I'm not so sure it will be so easy to implement, but I agree it will give value to GitLab, so I'm opening a discussion about its position right now.
One thing, just to keep in mind is there isn't a rule about teams like ~Verify or ~Secure working in other stages if it pushes your own goals forward. So if the ~Secure team in this case feels like they can contribute something security related to checking for credentials and secrets in commits, that's welcome and not something ~"devops:verify" would want to prevent from happening, even if the category might be moved around again some day.
Thanks @jlenny we know that but we should focus on our priorities first, and unfortunately we unlikely have time to work on something outside our defined scope (even if we'd love to).
I'm very confident that the clarification will bring Secret Detection in the ~"devops:secure" stage, but that is the very first step to work on. Otherwise we could still contribute, but unfortunately the timeline will be very different.
Regex: tools that run regexps on filenames or content to detect secrets.
Entropy: tools that detect high entropy strings, which are likely keys or password.
Commit history: tools that scan the commit history, analyzing diffs. They will report secrets in old commits that are no longer present in the project.
https://github.com/dxa4481/truffleHog
License: GPL2
Commit history, regexp, entropy.
Rules: mostly token present in source code. RSA, PGP key files, slack token, strip API key. 19 rules total, they cover most of those in Gitleaks.
Yara itself is a tool these rules use. It has dozen of collections of rules maintained by individual
or organisations, but geared toward malware detection in a filesystem.
https://github.com/UKHomeOffice/repo-security-scanner
License: MIT
Commit history, regexp.
Rules covers files containing secrets, like htpasswd, crypto keys, token files. No rule covering
secrets in source code files. It's complementary to Gitleak/truffleHog
We could use Gitleaks + Thrufflehog + repo-security-scanner
Since they scan the commit history, we'll need to flatten it before running these tools, keeping only one commit with the files as they exist in the branch.
@groulot I'm the maintainer of gitleaks. If there are any features you would like to request please open an issue and I'll try my best to get to them. Thanks!
@bikebilly Most of these secret finding tools operate on Git history. In the context of this issue I don't use it as we want to analyze files as they exist in a commit.
But outside SAST, we could imagine GitLab running these tool and reporting old commits containing secrets to the users, so that they can remove these commit from their history and revoke credentials if needed.
The reason to keep it just for the last commit is to provide quick results, assuming the check is executed for each and every commit (so checking history is not really necessary).
Since there are cases where this is valuable anyway, the other issue will give the (slow) opportunity.
@groulot when you say flatten, do you mean we take commits X0...Xn of a push and squash them down before running the tool? I would worry about secret leakage if that was the case.
@mentat That's nearly it, actually we copy the cloned repo present when the analyzer runs, remove .git from this copy and init a new repo with one commit adding all the files as-is. What secret leakage do you think could happen?
X0 - commit adds "var secret = XYZ"X1 - commit updates same line "var secret = ${my_secret}"
The problem is if we only scan the "net" of the push we wouldn't see any secrets--however the secret is still in the timeline and searchable/findable via the UI (potentially searchable on the web).
The "normal" process that gitleaks uses is to scan each commit in the specified range which would catch this.
So I totally get there is a separate issue for the full history scan, as that could be quite time consuming and is necessary primarily for initial on-boarding and significant rebasing however I believe squashing commits in the way you've described above for this issue will lead to an inaccurate statement on the quality of the code base (unless, of course, it is a merge event and the squash option is being similarly applied then).
All commits for a given diff should be examined individually, as all commits will be present in the history.
I agree, #9508 (closed) is scanning the entire repository history. This feature should be about detecting new secrets being introduced by an MR so it should be specifically targeting the set of commits that make up the current branch that the MR is based on.
GitLeaks does this automatically with the --since flag: