Token Revocation Endpoint scoped to Groups

See Feature Request - Admin API Endpoint: Token Info (#443597 - closed) for the genesis of this issue. Completing this is part of Tokinator Functionality - Token Revocation (gitlab-com/gl-security/product-security&9 - closed). It moves Tokinator functionality into the product, improves it, and creates a feature that can be, at some point, used by customers.

⚠ Customer Availability

This is being built by Product Security Engineering with the intended user being exclusively the Security Department. To achieve this, it will be Feature Flagged.

If Product wants to make this available to customers, ProdSecEng would go through a handover process. See also: https://gitlab.com/gitlab-com/gl-security/security-department-meta/-/issues/1783+ and Product Security Tooling Integration (gitlab-com/gl-security&291 - closed)

Solution Design

a single API endpoint
pass it any token
the backend determines what kind of token it is based on prefix (no prefix or an unsupported prefix will return an error)
the backend determines whether that token "belongs to" / "is effective within" that group, e.g.
- was created by a direct or indirect member of the group, any of its subgroups, or any of its projects
- belongs to a group / project descended from that group (Group/Project Access Tokens, Deploy Tokens/Keys)
returns a 404 Not Found if:
- the token doesn't exist
- the token does exist, but not within that group
if the token exists AND hasn't been revoked yet, the backend revokes the token
the 200 OK response returned contains details about the token
- Note, someone else might have revoked it first. The design here is that this endpoint always returns revoked token info.

Advantages:

A single API endpoint (client doesn't need to try figure out the type of token)
~~No Owner / admin token required (so, for example, anyone who discovers a token can revoke it)~~
- ~~This is great for SIRT since there are a bunch of namespaces they care about, or which might get created in future, but they aren't (or don't want to be) explicitly added to.~~
- It was decided to require Group Owner authentication
You only get info IF it gets revoked. This is important, so an attacker can't differentiate juicy useful tokens from boring ones.
You only revoke it if it belongs to a group you care about (for us, we'd send this request three times when we discover a leaked token. Once each for gitlab-org, gitlab-com, gitlab-private)
Can be used by Tokinator & SIRT
- Can even be used by SIRT if someone else revoked the token first, so that you can determine whose token it was and when it was revoked
Other GitLab Customers can use this feature (not in scope for this issue)
Harder to brute-force guess & revoke keys that haven't leaked; not that it's feasible anyway, but you need to do every combo of every token for every group.
The "what kind of token is this" logic can be re-used by a Token Inventory feature at a later date

Disadvantages:

No support for token rotation (e.g. Rotate a Personal Access Token for Service Account User)
Using this endpoint could break stuff, but if a token has leaked maybe that's moot anyway.
~~Unsure of technical complexity. This idea would need to first determine whether all tokens can have a transitive relationship to a group~~
- Only a small subset of token types have existing Revocation services that we can utilise
No support for instance level tokens (Impersonation token, instance runner registration token)

Example:

Sasha the software developer works for GitLab and is a member of gitlab-org.
Sasha creates a side project at gitlab.com/sasha/funthing and accidentally leaks a Runner Authentication Token
[Automated - not this issue] Tokinator discovers the token and calls DELETE https://gitlab.example.com/api/v4/groups/:id/revoke/self with the token in a header
[Automated - not this issue] Tokinator captures the response and reports that to SIRT
[Automated - not this issue] Tokinator @ mentions Sasha in the Slack revocation channel

Example

curl --header "PRIVATE-TOKEN: group_owner_pat" --request POST "https://gitlab.example.com/api/v4/groups/:id/tokens/revoke" --data '{"token" : "<leaked_access_token>"}'

{
  "id": 0123456,
  "name": "anyTokenHere",
  "type": "personal_access_token",  # The type of token (job, deploy, pipeline trigger, PAT, GAT etc)
  "created_by": "gitLabUserName",  # The username str creator of token (was it made by a user?)
  "project_id": 123456,  # The projectId associated with this token (is this a project-level token? A deploy token?)
  "group_id": 123456,  # The groupId associated with this token (is this a group-level token?)
  "job_id": 123456,  # The jobId (if this is a job-token?)
  "runner_id": 123456,  # The runnerId (is this a runner auth token?)
  "revoked": true,
  "created_at": "2023-10-09T22:15:41.919Z",
  "scopes": [
    "api",
    "read_api"
  ],
  "user_id": 6543210,
  "last_used_at": "2024-02-27T23:28:27.908Z",
  "active": true,
  "expires_at": "2024-10-08"
}

Acceptance Criteria

Child work items are complete
Issue(s) are created to include the ability to revoke more token types

Intended Users

Alex (Security Operations Engineer)
- "I’m the firefighter of the Security team. My objective is to prevent malicious attacks and mitigate active risks to my organization as they pop up, as quickly as possible" ... "there’s a high probability that the incident concerns something you’ve never dealt with before"...
- Alex triages a HackerOne issue where a third party identified a leaked token. To ensure that it is neutralised Alex submits it to the Token Revocation Endpoint for the groups she oversees.
- ~~Alex didn't need to be owner/maintainer of the groups she oversees, and finds this endpoint useful for quick and efficient incident response.~~ Alex has to be an Owner.
- Alex is provided enough information from the endpoint to perform incident response duties, if the token revoked belong to one of her groups.

Threat Model

What are we building?

See above, but basically: an API endpoint that will revoke a token if 1) you have the plaintext, 2) it can access data in a group you care about and 3) that group has the FF enabled. If the above conditions are met the endpoint also returns a revoked token's details in the response (scopes, creator, created dates, etc).

What could go wrong?

~~Availability An attacker could discover a plaintext token and revoke it, breaking a customer's workflow / automation / app~~
- ~~Note that if an attacker has a PAT, for example, they could use the User or PAT API to find out who owns the token, invite that user to an attacker-controlled top-level group, and then revoke it.~~
- ~~Note that an attacker could also just use the leaked token for other purposes, e.g. to read/update data, depending on the token type and scopes.~~
- This risk is mitigated: a Group Owner token is required. The risk is still technically present if the token you revoke AND token you use to authenticate are the same, and is for a Group Owner
~~Availability A team member might discover a plaintext token and revoke it, breaking a GitLab workflow / automation / app~~
- This risk is mitigated: a Group Owner token is required.
Confidentiality & Availability An attacker might use the endpoint to brute-force revoke arbitrary tokens that haven't leaked. (First, add a target victim as a member to a group the attacker owns.) The attacker then also gets a small bit of info in return, although the token now doesn't work.
Availability a logic error might mean that a token is revoked even if it can't access data in a group you care about.
Confidentiality differences in error responses might give away information about a group or token; e.g. an error like "This token doesn't belong to X group" indicates the token is valid for a different group.
~~Auditability A token is revoked, but we don't know by who, or why, or when.~~
- This risk is mitigated: a Group Owner token is required.
Confidentiality | Integrity | Availability Using this API requires a Group Owner PAT. If the PAT leaks from Tokinator, that's very bad.

What will we do about it?

A Feature Flag ensures that the endpoint is only active for Groups belonging to GitLab-the-organisation (e.g. gitlab-org, gitlab-com, ...).
- A customer could still be impacted if, and only if, they are a Guest+ member of a subgroup of one of those top level groups AND they leak a token. However in this case, that leaked token provides access to GitLab-the-organisation data (e.g. confidential issues) and so it is reasonable to revoke it.
- The Feature Flag won't be enabled for customer namespaces unless/until this feature is handed over from Product Security to a Development team.
The API can only be called by a Group Owner. We will know who revoked it.
Specs & peer code reviews increases the likelihood that the logic for "does this token give access to this group" is sound.
Specs & peer code review increases the likelihood that the logic for "show details about this token" only occurs when the token is revoked.
Tokens are cryptographically random. This existing and unchanged code ensures that tokens are long (Devise.friendly_token is the usual; 20 chars) and cryptographically pseudo-random. This makes brute forcing tokens infeasible. (This is an existing risk anyway - an attacker could try brute-force their way to a valid token by any API endpoint).
Automation monitoring (Out of scope for this issue). If & when Tokinator makes use of this endpoint, we will need to detect and handle rate limits. This should only occur if, for some reason, a large number of leaked tokens are detected. (Perhaps the automation fails to run for a number of days and finds a backlog; perhaps some automation in a downstream project leaks a large number of active tokens; etc).
API Logs & Audit Events Given this is Feature Flagged to GitLab-the-organisation namespaces we'll have EE Audit Events which capture current_user and IP address, and we will also have access to API request logs in Kibana / Devo.
End-user notification (Out of scope for this issue). If & when Tokinator makes use of this endpoint, it can use its existing leaked token processes to notify the owner of the token that their token was revoked. See: https://handbook.gitlab.com/handbook/security/product-security/application-security/runbooks/hackerone-process/#triaging-exposed-secrets
(Out of scope for this issue) Advanced-token scopes (see Add backend changes for MVC on advanced token s... (!154138 - closed)) will let us constrain the Group Owner PAT to this single endpoint

Before rolling out to customers

Consider waiting until a more complete Credential Inventory exists, so that admins can see if/when tokens are revoked via this endpoint
Consider waiting until more complete Token Revocation emails exist, so that token owners know their token leaked and got revoked
Consider gating the endpoint with authentication & authorization
Consider allowing opt-in at the group level, so Group Owners can avoid their tokens being revoked

Edited Jul 09, 2024 by Nick Malcolm