Follow-up from "Design proposal for Low-Privilege CI Job Tokens"

The following discussions from !7775 (merged) should be addressed:

@hsutor started a discussion: (+1 comment)

@mokhax What is meant by the "type of token presented"? I thought that in this case we were only talking about the CI_Job_Token? Can that token itself have different sub-types? Sorry, I'm new to all of this!
@dmakovey started a discussion: (+1 comment)

From end-user perspective oftentimes it is expected that executor's permissions are reflected in CI_JOB_TOKEN - i.e. just like on local workstation - when user launches application it runs with user permissions. This is not to say that we should not do this work, but we do need to offer users a path where they can configure their job tokens behaving like they do today, otherwise we're pushing them into creation of long-lived PATs which opens a whole different venue for attackers.
@tkuah started a discussion: (+5 comments)

Should we encode the exact declarative policy names ? It might lead to issues when those names are changed. Especially during a auto-deploy, when half of the pods have new code, and the other half has old code.
@fabiopitino started a discussion: (+3 comments)

We can specify that this can be configured by developers (merged by maintainers) and applies at job level.
@fabiopitino started a discussion: (+5 comments)
I think this is hard in practice to do well. In many scenarios we don't know upfront the exact resource that the user wants to access. For example, the job may specify read_issue permission because it wants to read a bunch of issues. But only at runtime we know what issue the user is trying to read. The user may have a general access to read_issue but the specific issue may be confidential and the user may not have access to that.

Customer will also want the ability to specify permissions for a subset of projects (perhaps using project: myorg/dev/*) to indicate multiple projects not defined upfront. According to the current design we would have to calculate the permission for all possible resources.

Instead I would suggest we encode the permissions in the claims and let the service (GitLab Rails or other service) handle that.
```
{
  "sub": "gid://gitlab/Ci/Build/1",
  "exp": 1893456000,
  "scope": {
    "build_read_project": ["*"], # all projects
    "update_pipeline": ["gitlab-org/gitlab", "gitlab-org/infra/*", "gitlab-com/*"],
    "download_image": ["security-products/*"]
  }
}
```
A format similar to the above could potentially allow a separate service (like the container registry) to handle the permissions separately from the GitLab Rails.
@fabiopitino started a discussion: (+5 comments)

What would the scope look like if the permissions is not specified or some wildcard specification exists (e.g. project: 'gitlab-org/*')? Also note that the CI job token allowlist allows groups and subgroups to be added, not just projects, which means we would need to expand all those groups and calculate the permission for each project.

In any case, more granular permissions that project-level permissions are challenging to implement and maintain over time. This is why I would prefer to encode only the specifications (permissions value) in the JWT and use that as a permissions mask when authorizing actions in can?(current_user, :do_something_with, subject). This technique is what we are using today to authorize actions based on CI job token allowlist.
@fabiopitino started a discussion: (+3 comments)

These kind of static permissions are not sufficient, for example, a project may be marked as being deleted or archived in the meantime, and we should prevent the job token from performing this action even if the job specified the permission and the user had permissions to do that at the time of the job being created.

Similarly if the user is blocked.

We should implement these permissions mask to work well with Declarative Policy, like the CI job token allowlist, rather than instead of it.
@jocelynjane started a discussion: (+1 comment)

We allow groups here too; I think that needs to be addressed here. Specifically, we will have to handle inheritance and think about a case where a project of a group might have different permissions.
@jocelynjane started a discussion: (+4 comments)

@mokhax do you know if we have validated this solution with customers?

@grzesiek started a discussion: (+11 comments)

This is something I believe we need to discuss. I'm not sure if this is realistic to only rely on content of the token when making a decision about authorization. Can we intersect authorizations each time we call can??

module Gitlab
  module Allowable
    def can?(user, ability, subject = :global, **opts)
      if ::Authz::Token.provided_for?(user)
        return false unless ::Authz::Token.allowed?(user, ability, subject) # Token used to narrow down the authorizations
      end

      Ability.allowed?(user, ability, subject, **opts)                      # but we still consult declarative policies
    end
  end
end

WDYT?

@grzesiek started a discussion: (+7 comments)

I wonder what would happen if we generate a stronger token than we should by a mistake of some sort, and we skip consulting a declarative policy here. Should we still check the policy? See an example !7775 (diffs, comment 2118962803).
@grzesiek started a discussion: (+1 comment)

What happens if a CI job is configured with permissions which are not available to a user triggering a pipeline?
@grzesiek started a discussion: (+2 comments)

I would probably define permissions (as user-consumable strings) in Ci::Config.
@grzesiek started a discussion: (+3 comments)

I'm not sure if we have to expand permissions here. It all boils down to when do we call can? - during token generation or during the request made with the token.
@grzesiek started a discussion: (+1 comment)

@mokhax Looks great to me! There is just one thing I believe we should focus on: when do we actually call can?: during token generation or when the token is being used? My intuition tells me that we should do that in the latter case. WDYT?

Once we find answer to this question, I think we should just go forward and merge this merge request. We can always refine this later.
@dbiryukov started a discussion:
To obtain the complete list of CI_JOB_TOKEN entry points –
1. We have an additional API supported by CI_JOB_TOKEN - Pipeline API - Update pipeline metadata.
It is not documented on the pipeline API page, but on CI_JOB_TOKEN documentation https://docs.gitlab.com/ee/ci/jobs/ci_job_token.html

PUT /projects/:id/pipelines/:pipeline_id/metadata :update_pipeline
1. Also CI_JOB_TOKEN is handled on the git access layer for the project's repositories:
2. git clone - :build_download_code
3. git pull - :build_download_code
4. git fetch - :build_download_code
5. git push ( for the same project where the job is initiated, behind a FF ) - :build_push_code
@fabiopitino started a discussion: (+1 comment)
Based on the discussion in this thread, this proposal should include 2 main changes:
1. Enhancements to the current Job Token Scope (inbound allowlist) feature. Allow users to specify granular permissions when granting access to another project or group. The allowlist will be enforce by default in %18.0.
2. Introduce job-level permissions (`permissions` key / outbound permissions). This is introduced as opt-in feature and it will be highly recommended in our security best practices.
This made me realize that we need to specify for (1) how we are going to affect the %18.0 breaking changes. For example, about the existing allowlist entries that don't have granular permissions, does it mean they get all permissions or zero permissions in %18.0 ? The latter is a much bigger breaking change because it will affect also users that today have job token scope enabled.
@fabiopitino started a discussion: (+2 comments)
@mokhax based on our conversation I think we agreed that we can descope the MVC to:
- define static set of permissions that today CI_JOB_TOKEN is allowed to use (you have collected a comprehensive list in this MR already) 👍
- when we check for permissions in a given endpoint we reject any permissions that are not in the list above, if the user is coming from a job token. This stops unwanted authorizations from sneaking to production.
This would allow us to harden the CI job token authorization into fixed set of permissions known today.

In a subsequent iteration we can introduce a JWT CI token with claims. We will decide what the claims would look like along the way.

let me know if I missed anything.
@grzesiek started a discussion: (+2 comments)

@mokhax we had a great sync session with @fabiopitino on Zoom and went through your great design doc together, trying to figure it out how we can get this merged today. We came up with a proposal in 793ca77c. The commit can help us to defer some of the key decisions to subsequent ADRs (Architectural Decision Records). We focused on removing items which we are not ready to make a decision on today (as we might be missing some PoC or performance evaluations), but still retaining the main design proposal you outlined here.

We are both on board with this content (me and Fabio) and we are fine with merging this content if this looks good to you as well. We are happy to help with delivering PoCs / performance benchmarks / data needed for making subsequent design decisions, but the main proposal here is great, and we should just get this merged :) WDYT?

@mokhax started a discussion:

This proposal introduces a different kind of a `CI_JOB_TOKEN`, following a

Edited Sep 27, 2024 by Alex Buijs