Create workspaces from private repositories
## Background [GitLab Workspaces](https://docs.gitlab.com/ee/user/project/remote_development/index.html) is our remote development product that allows developers to spin up ephemeral workspaces based on a [devfile](https://devfile.io/). Workspaces run in Kubernetes as pods running a version of our [WebIDE](https://docs.gitlab.com/ee/user/project/web_ide/index.html). Workspaces are currently offered as a BYOK (bring your own kubernetes) offering, however the plan is to support GitLab.com managed workspaces sometime in the future. Workspaces can be spun up from the GitLab UI once GitLab agent for Kubernetes is installed in a Kubernetes cluster and registered. Ingress to the workspace is controlled using the [GitLab Workspaces Proxy](https://gitlab.com/gitlab-org/remote-development/gitlab-workspaces-proxy) which is responsible for authentication and authorization. Currently the workspaces proxy only supports connectivity to the IDE via HTTP however in the future we also want to support SSH. ## Problem Definition In the Beta release for GitLab workspaces, we only allowed users to use public repositories and did not inject any user credentials into a running workspace. The repository is cloned using a `cloner` container that runs as an init container. The `cloner` container just does a simple `git clone`. To create a workspace from a private repository, we need to inject some credentials which can access the said private repository. **Note** - `User should be able to perform git operations transparently from within the workspaces` is out of scope of this issue and will be taken as a followup. The scope of this issue is to only be able to create a workspace from a private repository. Refer this [comment](https://gitlab.com/gitlab-org/gitlab/-/issues/411468#note_1416383966 "Workspaces: Design for creating workspaces from private repositories"). ## Solution Proposed #### Database Changes - Alter the `workspaces` table - Add `personal_access_token_id` column - To store the foreign key reference to the user PAT generated for each workspace so that it can be revoked on workspace termination. - Create `workspace_variables` table to store the variables(secret or otherwise) to be injected into the workspace. - Add `key` column - To store the key of the variable(secret or otherwise) plain text. - Add `encrypted_value` and `encrypted_value_iv` columns - To store the `value` of the variable(secret or otherwise) which is encrypted with the GitLab instance's secret key - Add `workspace_id` columns - Foreign key reference to the `workspaces` table. - Add `variable_type` column - To store whether this data is to be injected as an environment variable or a file. #### Rails Creation Logic ```mermaid sequenceDiagram actor User participant GitLabUI as GitLab UI participant GraphQLAPI as GraphQL API participant CreateCreateProcessor as Create::CreateProcessor participant CreateDevfileProcessor as Create::DevfileProcessor participant PersonalAccessTokensCreateService as PersonalAccessTokens::CreateService User ->> GitLabUI: Create Workspace GitLabUI ->> GraphQLAPI: Create Workspace GraphQLAPI ->> CreateCreateProcessor: Process creation CreateCreateProcessor ->> CreateDevfileProcessor: Process Devfile CreateDevfileProcessor -->> CreateCreateProcessor: Processed Devfile CreateCreateProcessor ->> PersonalAccessTokensCreateService: Generate user PAT PersonalAccessTokensCreateService -->> CreateCreateProcessor: User PAT CreateCreateProcessor -->> CreateCreateProcessor: Store `personal_access_token_id` <br> in `workspaces` table as foreign key CreateCreateProcessor->> CreateCreateProcessor: Encrypt user PAT and store in DB with `key` as GITLAB_PAT CreateCreateProcessor->> CreateCreateProcessor: Encrypt git configuration variables like <br> GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL, <br> GIT_ASKPASS, GITLAB_PAT_FILE_PATH, <br> and GIT_ASKPASS_SCRIPT and store in DB CreateCreateProcessor -->> GraphQLAPI: GraphQLAPI -->> GitLabUI: GitLabUI -->> User: ``` #### Rails Termination Logic ```mermaid sequenceDiagram actor User participant GitLabUI as GitLab UI participant GraphQLAPI as GraphQL API participant UpdateUpdateProcessor as Update::UpdateProcessor participant PersonalAccessTokensRevokeService as PersonalAccessTokens::RevokeService User ->> GitLabUI: Terminate Workspace GitLabUI ->> GraphQLAPI: Update Workspace GraphQLAPI ->> UpdateUpdateProcessor: Process updation UpdateUpdateProcessor ->> PersonalAccessTokensRevokeService: Revoke user PAT PersonalAccessTokensRevokeService -->> UpdateUpdateProcessor: UpdateUpdateProcessor ->> UpdateUpdateProcessor: Update workspace desired state to Terminated UpdateUpdateProcessor -->> GraphQLAPI: GraphQLAPI -->> GitLabUI: GitLabUI -->> User: ``` #### Agent \<-\> Rails Logic ```mermaid sequenceDiagram participant GitLabAgent as GitLab Agent participant WorkspacesReconcileService as Workspaces::ReconcileService participant ReconcileReconcileProcessor as Reconcile::ReconcileProcessor participant ReconcileDesiredConfigGenerator as Reconcile::DesiredConfigGenerator participant ReconcileDevfileParser as Reconcile::DevfileParser GitLabAgent ->> WorkspacesReconcileService: Reconcile WorkspacesReconcileService ->> ReconcileReconcileProcessor: Process ReconcileReconcileProcessor ->> ReconcileDesiredConfigGenerator: Generate desired configuration ReconcileDesiredConfigGenerator ->> ReconcileDevfileParser: Generate Kubernetes resources <br> from processed devfile ReconcileDevfileParser -->> ReconcileDesiredConfigGenerator: Kubernetes resources <br> (Deployment and Service) ReconcileDesiredConfigGenerator ->> ReconcileDesiredConfigGenerator: Generate Kubernetes Secret for <br> the variables(decrypted values) to be mounted as environment variables ReconcileDesiredConfigGenerator ->> ReconcileDesiredConfigGenerator: Generate Kubernetes Secret for <br> the variables(decrypted values) to be mounted as files ReconcileDesiredConfigGenerator ->> ReconcileDesiredConfigGenerator: Inject Kubernetes Secret into <br> generated Kubernetes Deployment ReconcileDesiredConfigGenerator -->> ReconcileReconcileProcessor: Generated resources ReconcileReconcileProcessor -->> WorkspacesReconcileService: Generated resources WorkspacesReconcileService -->> GitLabAgent: Generated resources ``` #### Agent Logic No change in logic. However, agent should support multiple inventory configuration for different objects passed in a single configuration to apply. #### Git configuration to be injected into the workspace - The git configuration items that need to be injected into the workspace are ```yaml GIT_CONFIG_COUNT=3 GIT_CONFIG_VALUE_0: "credential.helper" GIT_CONFIG_VALUE_0: "/.workspace-data/variables/file/gl_git_credential_store.sh" GIT_CONFIG_KEY_1: "user.name" GIT_CONFIG_VALUE_1: <USER_NAME> GIT_CONFIG_KEY_2: "user.email" GIT_CONFIG_VALUE_2: <USER_EMAIL> GL_TOKEN_FILE_PATH: "/.workspace-data/variables/file/gl_token" gl_git_credential_store.sh: | #!/bin/sh # This is a readonly store so we can exit cleanly when git attempts a store or erase action if [ "$1" != "get" ]; then exit 0 fi if [ -z "${GL_TOKEN_FILE_PATH}" ]; then echo "We could not find the GL_TOKEN_FILE_PATH variable" exit 1 fi password=$(cat ${GL_TOKEN_FILE_PATH}) # The username is derived from the "user.email" configuration item. Ensure it is set. echo "username=does-not-matter" echo "password=${password}" exit 0 gl_token: <PAT_GENERATED> ``` - Except `gl_git_credential_store.sh` and `gl_token`, all variables will be mounted to the Deployment as environment variables while `gl_workspace_askpass` and `gl_token` will be [mounted as a files](https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets-as-files-from-a-pod) at location `/.workspace-data/variables/file/gl_git_credential_store.sh` and `/.workspace-data/variables/file/gl_token` respectively. - The reason for mounting the token as a file is because it allows us to update the token in Kubernetes Secret in the future(follow-up item when the need arises) and relying on Kubernetes to update the file inside the container of the workspace with eventual consistency without restarting the workspace. - More details about how the git configuration works is documented [here](https://gitlab.com/gitlab-org/gitlab/-/issues/418934#note_1507651015). - In future, when we want to introduce authentication using SSH keys, we can use a similar method of mounting required Kubernetes Secrets as files/environment variables. #### Notes and concerns - For future iterations, we will have to work with Auth team to make all user PATs generated by the workspace services to be non-editable by the end user. As part of this issue scope, the user PAT will get created and the user can view/revoke it in the profile for each workspace. - This solution is thinking from a long-term solution POV to be efficient in terms of building a generic solution which is extendable for other cases rather than building something very specific which we would have to dump to build a generic solution for our future needs. Thus, it might have a relatively larger scope. However, based on our understanding, the scope is still manageable. - The `workspace_variables` table is inspired by tables like `ci_job_variables`, `ci_variables`, etc. while `workspace_variables_encryption_key` is inspired by the `pages_domain` table. The database table design sets the foundation for our future use-cases of injecting variables from group/subgroups/project/user level variables/secrets. - The data in `workspace_variables` would keep on increasing as it does for CI tables. **This might become a problem in the future.** - One mitigation strategy would be to drop records from this tables when a workspace has been successfully terminated. However, this is out of scope for this issue. - Having a separate public/private key for each workspace isolates the damage that can be done in case there is a MITM attack between KAS and agent and the private key of the workspace used for encrypted is somehow leaked(since the private key would be sent in plain text format). The private key is only sent to agent during the workspace creation and during full sync to reduce the number of times we send this informaiton. - One concern raised is that the decryption of various values done at agent will be CPU consuming. However, it is better to have this decryption at the agent rather than at Rails. This is because the way agent communicates with Rails, Rails will have to generate all Kubernetes resources every time there is any update it needs to send to agent or whenever it needs to acknowledge some update that agent has sent Rails regarding the workspace. As part of generating the Kubernetes resources, it would have to decrypt various variables that are to be injected into the workspace. This problem will be further exasperated when we extend this feature to be generic enough to inject variables from group/subgroups/project/user level variables/secrets. - This can be mitigated by checking the SHAs of the encrypted values with the secrets already generated(which would have this SHA when they were created). - Another approach would be to move this inside the workspace which will decrypt these environment variables. However, this would not be as straight-forward. - This would be considered premature optimization and thus would not be the scope of this issue. It can be tackled in the future if the need arises. - We are assuming the user PAT will be valid for a time longer than the workspace has been created for. Right now, workspaces are auto-terminated after 120 hours. - ~~Maybe use `ECDSA with the P-256 curve` public/private key pair for better performance as per analysis in https://gitlab.com/gitlab-org/gitlab/-/issues/361168#note_933171008 for better performance?~~ ## Other Solutions Considered ### Option 1 - Generate a Personal Access Token scoped to a user for each workspace and send to agent without storing in database <details> <summary>Show more details</summary> #### For **HTTP** - Generate a new [Personal Access Token](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html) for every workspace that is created. The PAT will be user scoped instead of project scope as the user may want to be able to work with more than one project in the workspace in the future or just do other things which are scoped to the user but not the project(e.g. pulling a container image). - The PAT will be created using the `PersonalAccessTokens::CreateService` in `RemoteDevelopment::Workspaces::Reconcile::DesiredConfigGenerator.generate_desired_config`. - This PAT would be used in a Kubernetes Secret which is passed to the agent and is mounted on the Kubernetes Deployment representing the workspace. To mount the Secret on the Deployment, we will have to make changes to the devfile-gem. - We do not want to store the secret in the database and therefore the secret should not be stored in `processed_devfile` field on the `workspace` object. - The Kubernetes Secret which [encrypted at rest](https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/) in `etcd` would contain the following fields ```yaml GIT_CONFIG_COUNT: 3 GIT_CONFIG_VALUE_0: "credential.helper" GIT_CONFIG_VALUE_0: "/.workspace-data/variables/file/gl_git_credential_store.sh" GIT_CONFIG_KEY_1: "user.name" GIT_CONFIG_VALUE_1: <USER_NAME> GIT_CONFIG_KEY_2: "user.email" GIT_CONFIG_VALUE_2: <USER_EMAIL> GL_GIT_CREDENTIAL_STORE_FILE_PATH: "/.workspace-data/variables/file/gl_git_credential_store.sh" GL_TOKEN_FILE_PATH: "/.workspace-data/variables/file/.workspace-data/gl_token" gl_git_credential_store.sh: | #!/bin/sh # This is a readonly store so we can exit cleanly when git attempts a store or erase action if [ "$1" != "get" ]; then exit 0 fi if [ -z "${GL_TOKEN_FILE_PATH}" ]; then echo "We could not find the GL_TOKEN_FILE_PATH variable" exit 1 fi password=$(cat ${GL_TOKEN_FILE_PATH}) # The username is derived from the "user.email" configuration item. Ensure it is set. echo "username=does-not-matter" echo "password=${password}" exit 0 gl_token: <PAT_GENERATED> ``` - Except `gl_git_credential_store.sh` and gl_token`, all variable will be mounted to the Deployment as environment variables while `gl_git_credential_store.sh` and gl_token` will be [mounted as a files](https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets-as-files-from-a-pod) at location `/.workspace-data/variables/file/gl_git_credential_store.sh` and `/.workspace-data/variables/file/gl_token` respectively. - The reason for mounting `gitlab token` as a file is because it allows us to update the `gitlab token` in Kubernetes Secret and relying on Kubernetes to update the file inside the container of the workspace with eventual consistency. - These environment variables and file mounting will be done for the project-cloner container and the main container of the workspace. - When the user performs any git actions, the git CLI will use the configuration from the [`GIT_CONFIG_*` environment variables](https://git-scm.com/docs/git-config#ENVIRONMENT) and based on that use the credentials provided by the file pointed by custom git credential store defined ([reference](https://git-scm.com/book/en/v2/Git-Tools-Credential-Storage)). - Additionally, the `gitlab token` will be revoked when the workspace is terminated. The workspace will have to hold a FK to the generated PAT. - The Kubernetes Secret will only be created for a new workspace. For an existing workspace, we will assume that the Kubernetes Secret exists in the Kubernetes cluster. It is not possible for us to regenerate a PAT during every partial/full reconciliation. Rotating of the PAT while the workspace has not been terminated, would be tackled in a followup issue if needed. Thus the secret will not be part of the `inventory configmap` that we use to track resources related to the workspace in Kubernetes. ```mermaid sequenceDiagram actor User participant GitLabUI as GitLab UI participant GraphQLAPI as GraphQL API participant CreateCreateProcessor as Create::CreateProcessor participant CreateDevfileProcessor as Create::DevfileProcessor User ->> GitLabUI: Create Workspace GitLabUI ->> GraphQLAPI: Create Workspace GraphQLAPI ->> CreateCreateProcessor: Process creation CreateCreateProcessor->> CreateDevfileProcessor: Process Devfile CreateDevfileProcessor ->> CreateDevfileProcessor: Generate cloner with credentials copy command ``` ```mermaid sequenceDiagram participant PersonalAccessTokensCreateService as PersonalAccessTokens::CreateService participant ReconcileDesiredConfigGenerator as Reconcile::DesiredConfigGenerator participant ReconcileReconcileProcessor as Reconcile::ReconcileProcessor participant WorkspacesReconcileService as Workspaces::ReconcileService participant GitLabAgent as GitLab Agent participant K8sAPI as Kubernetes API participant Kublet as Kublet participant ContainerRuntime as Container Runtime participant Cloner Init Container as Cloner Init Container participant Workspace Volume as Workspace Volume GitLabAgent ->> WorkspacesReconcileService: Reconcile WorkspacesReconcileService ->> ReconcileReconcileProcessor: Process ReconcileReconcileProcessor ->> ReconcileDesiredConfigGenerator: Generate desired configuration ReconcileDesiredConfigGenerator ->> ReconcileDesiredConfigGenerator: Check if PAT is already present for workspace using convention <workspace-name>-PAT ReconcileDesiredConfigGenerator ->> PersonalAccessTokensCreateService: Generate PAT PersonalAccessTokensCreateService -->> ReconcileDesiredConfigGenerator: New PAT for workspace ReconcileDesiredConfigGenerator ->> ReconcileDesiredConfigGenerator: Inject Kubernetes secret into generated workspace k8s resources ReconcileDesiredConfigGenerator -->> ReconcileReconcileProcessor: Generated resources ReconcileReconcileProcessor -->> WorkspacesReconcileService: Generated resources WorkspacesReconcileService -->> GitLabAgent: Generated resources GitLabAgent ->> K8sAPI: Apply resources (including secret) Kublet ->> K8sAPI: Watch for resources - Start Pod Kublet ->> ContainerRuntime: Start Cloner Init Container ContainerRuntime ->> Cloner Init Container: Start with script Cloner Init Container ->> Workspace Volume: Copy credentials to custom file used by `GIT_ASKPASS` in volume ``` #### For **SSH** - In future, when we want to introduce authentication using SSH keys, we can use a similar method of mounting required Kubernetes Secrets as files/environment variables. </details> ### Option 2 - Use the OAuth2.0 token for HTTP, for SSH inject a new private/public key pair into the repository <details> <summary>Show more details</summary> In this option we have two routes depending on whether the user connects via HTTP/S or SSH. #### For **HTTP** The GitLab Workspace Proxy already gets an OAuth2 token when the user is authenticated. Currently the token is only used to verify that the user has access to the workspace by calling the workspaces GraphQL API. We currently don't send this token to the workspace. In this design every workspace will run a sidecar proxy. The proxy will read the OAuth access token that is injected into the HTTP request and will then use the injected token to clone the git repository. When the token expires, the workspaces proxy requests the user to re-authenticate and passes a new token in the request to the `Workspace Auth Proxy`. ```mermaid sequenceDiagram actor User User ->> Kubernetes Ingress: Access Workspace Kubernetes Ingress ->> GitLab Workspaces Proxy: Access Workspace GitLab Workspaces Proxy ->> GitLab: OAuth2.0 Flow (not detailed) GitLab -->> GitLab Workspaces Proxy: Access Token GitLab Workspaces Proxy ->> GitLab: Authorize via GraphQL API GitLab Workspaces Proxy ->> Workspace HTTP Git Cloner: Send Access Token in proxy header Workspace HTTP Git Cloner ->> Workspace HTTP Git Cloner: Update Git Credentials file Workspace HTTP Git Cloner ->> Workspace HTTP Git Cloner: Clone repository ``` #### For **SSH** For SSH we will need to generate a new SSH key pair for the user when the workspace is created. We then will inject the SSH Key pair into the proxy using a similar mechanism to `Option 1` above. The public key will be registered as a `Key` object in GitLab. </details> ### Option 3 - Use GitLab as secrets store and sync secrets to Kubernetes based on Kubernetes Secrets Store CSI Driver <details> <summary>Show more details</summary> - Allow users to define secrets at a project. user and workspace level. - When defined at a project level, any user who can create a workspace from the project, will have those secrets injected into the workspace. - When defined at a user level, all workspaces created by a user would have those secrets injected into the workspace. - When defined at a workspace level, only the said workspace would have those secrets injected into the workspace. - If there is a namespace clash between project level secrets and user level secrets, user level secrets will take precedence. - If there is a namespace clash between user level secrets and workspace level secrets, workspace level secrets will take precedence. - Defining workspace level secrets allows us to enforce certain secrets that we'd like to inject into a workspace. - Define a new module in GA4K which syncs secrets from GitLab into Kubernetes as Secrets. - Reference - https://secrets-store-csi-driver.sigs.k8s.io/concepts.html and https://github.com/aws/secrets-store-csi-driver-provider-aws - This module will look for `secretproviderclasses.secrets-store.csi.x-k8s.io` resources and based on their definition, it will sync the data from GitLab as Kubernetes Secrets. If we want to refrain from using a CRD, we can use watch over a ConfigMap/Secret with particular label which defines what secrets from GitLab are needed for a given workspace. - As part of Remote Development, when create a workspace, along with Deployment and Service Kubernetes resources, create a `SecretProviderClass` Kubernetes resource(or the required ConfigMap/Secret resource). - This would also allow for secret auto rotation in GitLab with sync in the Kubernetes secrets and Kubernetes Pods mounting the said Secret. - The secrets will be stored in the database encrypted, as they are right now for CI/CD secrets. - The user PAT would be created as a workspace level secret. </details> ## Next Steps * [x] Review solution by AppSec * [x] Spike out the proposed solution in https://gitlab.com/gitlab-org/gitlab/-/issues/414420+ . We already have one spike done for figuring out how/what git configuration items to inject ( https://gitlab.com/gitlab-org/gitlab/-/issues/414420+). * [x] Update Architecture Blueprint based on the results of the spike. * [x] Implement solution based on all feedback received.
epic