[Public] Leaked Token Detection & Revocation Product Security Tooling Integration (#294) · Epics · GitLab Security Division

[Public] Leaked Token Detection & Revocation Product Security Tooling Integration

# :warning: This epic is public - don't add confidential info! [Tokinator](https://gitlab.com/gitlab-com/gl-security/appsec/tokinator) is automation built to detect and handle (either revoke, or open an issue for human triage) leaked tokens that might affect GitLab (the org). See https://internal.gitlab.com/handbook/security/product_security/token-leaks/organizational_controls/ for how token leaks are currently handled by various tooling, including Tokinator. These tools and the teams that run them are stakeholders for this effort. This epic is to track any work that makes Tokinator a product-first feature. The primary customer is AppSec and SIRT, but eventually any customer could use it. ## Acceptance Criteria - Leaked tokens in GitLab are automatically detected - "Leaked" == secrets stored in an incorrect & "stealable" way. Could be public code in a public repo, could be in a job log visible only to Project Developers. Both are still leaks. - Tokens are either: revoked, or something (an issue / incident) is opened for a DRI to respond to the leak - Metrics exist to evaluate the adoption and effectiveness of the feature. ## 17.11 Plan As of %17.11, no work on integrating Tokinator into the product is currently planned. ## 17.10 Plan - [x] @nmalcolm working on https://gitlab.com/gitlab-com/gl-security/appsec/tokinator/-/issues/74+s to validate the Group Token Revocation Endpoint - [ ] @ashmckenzie delivering an issue/epic/architecture design on ["Explore "Scheduled token search search as a feature" instead of the Secret Detection dependency"](https://gitlab.com/gitlab-com/gl-security/product-security/product-security-engineering/product-security-engineering-team/-/issues/175). **This is likely the path we'll take, which will change the sunset roadmap** - [ ] Acceptance criteria updated (if needed) - [x] https://gitlab.com/gitlab-com/gl-security/product-security/product-security-engineering/product-security-engineering-team/-/issues/172+s ## Old Sunset Roadmap <details><summary>Click to expand</summary> Phases 1-3 can be worked on simultaneously ### Phase 1: Improve token validation & revocation in GitLab (Current) 1. https://gitlab.com/groups/gitlab-com/gl-security/product-security/-/epics/9+ 1. Complete https://gitlab.com/gitlab-org/gitlab/-/issues/460777+ 1. (TBC) Create an endpoint that can validate a token _without_ revoking it 1. Whenever a Token can be revoked in GitLab, update Tokinator to use that endpoint. This phase is complete when Tokinator can revoke all token types via the GitLab API. ### Phase 2: Automate incident response in GitLab > In any other case of detection, the Rails application manually creates a vulnerability using the Vulnerabilities::ManuallyCreateService to surface the finding in the existing Vulnerability Management UI. 0. Change this to current when we start on it 1. Collaborate with, or wait for, ~"devops::secure" to automatically create Vulnerabilities for detected secrets https://docs.gitlab.com/ee/architecture/blueprints/secret_detection/#phase-3---expansion-beyond-push-protection-service 1. Collaborate with ~"devops::secure" to extend incident handling with automatic token revocation 1. If the token revokes, mark the Vulnerability as `Confirm` and create an Incident 2. If the token does NOT revoke, mark the Vulnerability as `False Positive`, and add a comment. 1. Update Tokinator to use GitLab Secret Detection for incident triage & handling 1. Update processes to match GitLab feature 1. Current process is that Tokinator creates issues in a separate private repository, and AppSec manually use the `/security` Slack command anyway. When this is all complete the tokens will be automatically revoked so there's less harm having them open to the audience of project members via Incidents / Vulnerabilities. 1. Add a configurable webhook event for "Leaked token found" so that, as well as creating a Vulnerability/Incident/Whatever, we can also trigger an event in third-party services like Tines. This phase is complete when Tokinator no longer needs to create Issues. <details><summary>Slack discussion on why Secret Detection plans to create Vulnerabilities instead of Incidents after detection</summary> <blockquote> Nick Malcolm In [Architecture Design Documents Secret Detection as a platform-wide experience](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/secret_detection/) "Any detections will be managed using the existing Vulnerability Management UI.". Was there any discussion I can go read about using Vulnerability vs. Incident? For context, over in Product Security Engineering we're starting to pick up work again on moving Tokinator functionality into the product. Tokinator creates issues for SIRT to respond to when it spots a leaked secret, e.g. in a comment, or code that didn't have secret detection enabled. </blockquote> <blockquote> Ahmed > Was there any discussion I can go read about using Vulnerability vs. Incident? Not that I can recall, but I suppose the reason why we have defaulted to vulnerability management UI over using incident issues/UI in that part of the design document is because platform-wide SD was planned as a product feature, and the vulnerabilities dashboard is often the place were our customers manage any sort of triaging/remediation of detected secrets (or vulnerabilities, for that matter). That said, I don’t think we had actually worked on any [target type](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/secret_detection/#target-types) for which secrets are stored, let alone being displayed in the vulnerability dashboard. Both SPP and Client-side SD do not go beyond alerting the user and preventing them from committing/writing the secret – but I suspect that as we expand [SD scanning to Job Artifacts](https://gitlab.com/gitlab-org/gitlab/-/issues/391899) we will continue to use the vulnerability dashboard. @theoretick and @Vishwa Bhat may also be able to share more on the vulnerability vs. incident decision. </blockquote> <blockquote> theoretick While I see a more immediate need for Secrets, I could make the same argument for most sec vulnerabilities, which was the incentive behind https://gitlab.com/gitlab-org/gitlab/-/issues/415751 and https://gitlab.com/gitlab-org/gitlab/-/issues/448282. But that's a "we would love to do this big massive change looooong term" answer which isn't likely very satisfying for now. In the short-term incidents are a work-item type so I'd probably suggest we ensure that creating an "issue" from a vulnerability supports every work item type. The current philosophy is essentially separating issue management from vulns, so we could consider an incident part of the former with a passing reference to the vuln? </blockquote> <blockquote> Ahmed The current philosophy is essentially separating issue management from vulns, so we could consider an incident part of the former with a passing reference to the vuln? I imagine this would be more noisy than to only use the vulnerability dashboard, but I guess from a security team point of view, an “incident” work-item would get higher priority/attention, which in the case for secrets is likely called for, seeing as a secret leak can promptly lead to a serious security incident. </blockquote> <blockquote> Nick Malcolm Reading the docs it looks like Vulnerabilities are quite closely tied with code which could be tricky when secrets are found in comments or descriptions. But the Vulnerability themes of false positives & automatic remediation are great and would work for secrets detected anywhere. I do like that Incidents can be assigned; doesn't look like Vulnerabilities can be. Anyhoo - this is good info, thank you both! As we get closer to working on Product Security Issues in this area we'll be sure to get issue-based discussion on what type of record to create when a secret is detected. </blockquote> </details> ### Phase 3: Improve token detection in GitLab Tokinator needs to detect tokens so it can create a SIRT issue and automatically revoke the token if it's a true positive. 1. Collaborate with, or wait for, ~"devops::secure" to detect leaked secrets across the following target types https://docs.gitlab.com/ee/architecture/blueprints/secret_detection/#target-types 2. Add support for detecting leaked secrets in Snippets & Wiki (not included in the Secret Detection Blueprint) This phase is complete when Tokinator (the project) no longer needs to detect leaked tokens. And we can archive Tokinator. ### Phase 4: Deprecate the cloud service & update external Tokinator users Tokinator has a cloud function that _other_ tools use to handle when a token is detected. 1. Omamori triggers Tokinator when it discovers secrets, e.g. https://gitlab.com/gitlab-private/gl-security/engineering-and-research/product-security-engineering/token-hunter-reports/-/issues/319 2. YouTube Token Scanner triggers Tokinator when it discovers secrets 3. [Artifact Scanner](https://gitlab.com/gitlab-com/gl-security/product-security/product-security-engineering/tooling/jobScanning) triggers Tokinator issues when it discovers secrets All of the above should use GitLab-native functionality to: 1) attempt to revoke the token (which will only work if it affects GitLab the company) and 2) open an incident if it was successful. ### Phase 5: Deprecation & Removal :city_sunset: 1. Once all use-cases of Tokinator _should_ be covered, check logs to see when is Tokinator still being used. Move back to Phase 4. 1. Update Tokinator code to start raising errors instead of functioning. Wait for a milestone to see what breaks. 1. Remove the Tokinator cloud function. 1. Archive the Tokinator project. 1. Ensure any documentation & references are updated: the internal or external handbook; repo-based runbooks; ... This phase is complete when Tokinator is archived & GitLab performs all its functionality in-product. ### (Optional) Phase 6: Awareness raising & general availability This steps do not prevent us from sunsetting Tokinator, but are maybe worth doing anyway: 1. Where functionality is behind a Feature Flag and enabled only for GitLab-the-company, collaborate with Development on making this generally available 1. Safe defaults 2. Auditing & visibility: when a token is revoked, the owner(s) should be able to see that, even if they weren't included in the incident. E.g. https://gitlab.com/gitlab-org/gitlab/-/issues/462217+; Audit Events for customers on a plan with Audit Events; email notifications; in-app notifications 1. This migration could be worth a blog post telling the end-to-end story of token leaks -> manual automation -> built-in 1. How much were token leaks costing us? 2. How much time did we spend building & maintaining Tokinator? 3. How much time do we spend managing this risk, now that its built in to the product? </details>

epic