Workspaces: Design for creating workspaces from private repositories

Background

GitLab Workspaces is our remote development product that allows developers to spin up ephemeral workspaces based on a devfile. Workspaces run in Kubernetes as pods running a version of our WebIDE. Workspaces are currently offered as a BYOK (bring your own kubernetes) offering, however the plan is to support GitLab.com managed workspaces sometime in the future.

Workspaces can be spun up from the GitLab UI once GitLab agent for Kubernetes is installed in a Kubernetes cluster and registered. Ingress to the workspace is controlled using the GitLab Workspaces Proxy which is responsible for authentication and authorization. Currently the workspaces proxy only supports connectivity to the IDE via HTTP however in the future we also want to support SSH.

Problem Definition

In the Beta release for GitLab workspaces, we only allowed users to use public repositories and did not inject any user credentials into a running workspace. The repository is cloned using a cloner container that runs as an init container. The cloner container just does a simple git clone. To create a workspace from a private repository, we need to inject some credentials which can access the said private repository.

Note - User should be able to perform git operations transparently from within the workspaces is out of scope of this issue and will be taken as a followup. The scope of this issue is to only be able to create a workspace from a private repository. Refer this comment.

Solution Proposed

Database Changes

  • Alter the workspaces table

    • Add personal_access_token_id column - To store the foreign key reference to the user PAT generated for each workspace so that it can be revoked on workspace termination.
    • Add encrypted_secret_key and encrypted_secret_key_iv columns - To store a unique secret_key which will be used to encrypt all the variables associated with the workspace in the workspace_variables table. The secret_key is stored encrypted in the database using [attr_encrypted](https://github.com/attr-encrypted/attr_encrypted) (algorithm aes-256-gcm). The plain text secret_key is passed to the agent so that it can decrypt the encrypted workspace variables.
  • Create workspace_variables table to store the variables(secret or otherwise) to be injected into the workspace.

    • Add key column - To store the key of the variable(secret or otherwise) plain text.
    • Add encrypted_value and encrypted_value_iv columns - To store the value of the variable(secret or otherwise) which is encrypted with the associated workspace's secret_key.
    • Add workspace_id columns - Foreign key reference to the workspaces table.

Rails Creation Logic

sequenceDiagram
    actor User
    participant GitLabUI as GitLab UI
    participant GraphQLAPI as GraphQL API
    participant CreateCreateProcessor as Create::CreateProcessor
    participant CreateDevfileProcessor as Create::DevfileProcessor
    participant CreateVariablesProcessor as Create::VariablesProcessor
    participant PersonalAccessTokensCreateService as PersonalAccessTokens::CreateService
    User ->> GitLabUI: Create Workspace
    GitLabUI ->> GraphQLAPI: Create Workspace
    GraphQLAPI ->> CreateCreateProcessor: Process creation
    CreateCreateProcessor ->> CreateDevfileProcessor: Process Devfile
    CreateDevfileProcessor -->> CreateCreateProcessor: Processed Devfile
    CreateCreateProcessor ->> CreateVariablesProcessor: Add variables to be injected into the workspace to the DB
    CreateVariablesProcessor ->> CreateVariablesProcessor: Generate `secret_key` <br> for the workspace and <br> add it to the DB. <br> Secret key would be <br> encrypted using `Gitlab::Application.secrets.db_key_base`
    CreateVariablesProcessor ->> PersonalAccessTokensCreateService: Generate user PAT
    PersonalAccessTokensCreateService -->> CreateVariablesProcessor: User PAT
    CreateVariablesProcessor -->> CreateVariablesProcessor: Store `personal_access_token_id` <br> in `workspaces` table as foreign key
    CreateVariablesProcessor->> CreateVariablesProcessor: Encrypt user PAT using workspace's `secret_key` <br> and store in DB with `key` as GITLAB_PAT
    CreateVariablesProcessor->> CreateVariablesProcessor: Encrypt git configuration variables like <br> GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL, <br> GIT_ASKPASS, GITLAB_PAT_FILE_PATH, <br> and GIT_ASKPASS_SCRIPT using workspace's `secret_key` <br> and store in DB
    CreateVariablesProcessor -->> CreateCreateProcessor: 
    CreateCreateProcessor -->> GraphQLAPI: 
    GraphQLAPI -->> GitLabUI: 
    GitLabUI -->> User: 

Rails Termination Logic

sequenceDiagram
    actor User
    participant GitLabUI as GitLab UI
    participant GraphQLAPI as GraphQL API
    participant UpdateUpdateProcessor as Update::UpdateProcessor
    participant PersonalAccessTokensRevokeService as PersonalAccessTokens::RevokeService
    User ->> GitLabUI: Terminate Workspace
    GitLabUI ->> GraphQLAPI: Update Workspace
    GraphQLAPI ->> UpdateUpdateProcessor: Process updation
    UpdateUpdateProcessor ->> PersonalAccessTokensRevokeService: Revoke user PAT
    PersonalAccessTokensRevokeService -->> UpdateUpdateProcessor: 
    UpdateUpdateProcessor ->> UpdateUpdateProcessor: Update workspace desired state to Terminated
    UpdateUpdateProcessor -->> GraphQLAPI: 
    GraphQLAPI -->> GitLabUI: 
    GitLabUI -->> User: 

Agent <-> Rails Logic

sequenceDiagram
    participant GitLabAgent as GitLab Agent
    participant WorkspacesReconcileService as Workspaces::ReconcileService
    participant ReconcileReconcileProcessor as Reconcile::ReconcileProcessor
    participant ReconcileDesiredConfigGenerator as Reconcile::DesiredConfigGenerator
    participant ReconcileDevfileParser as Reconcile::DevfileParser
    GitLabAgent ->> WorkspacesReconcileService: Reconcile
    WorkspacesReconcileService ->> ReconcileReconcileProcessor: Process
    ReconcileReconcileProcessor ->> ReconcileDesiredConfigGenerator: Generate desired configuration
    ReconcileDesiredConfigGenerator ->> ReconcileDevfileParser: Generate Kubernetes resources <br> from processed devfile
    ReconcileDevfileParser -->> ReconcileDesiredConfigGenerator: Kubernetes resources <br> (Deployment and Service)
    ReconcileDesiredConfigGenerator ->> ReconcileDesiredConfigGenerator: Generate Kubernetes Secret for <br> the workspace's `secret_key`
    ReconcileDesiredConfigGenerator ->> ReconcileDesiredConfigGenerator: Generate Kubernetes Secret for <br> the encrypted variables to be mounted as environment variables
    ReconcileDesiredConfigGenerator ->> ReconcileDesiredConfigGenerator: Generate Kubernetes Secret for <br> the encrypted variables to be mounted as files
    ReconcileDesiredConfigGenerator ->> ReconcileDesiredConfigGenerator: Inject Kubernetes Secret into <br> generated Kubernetes Deployment
    ReconcileDesiredConfigGenerator -->> ReconcileReconcileProcessor: Generated resources
    ReconcileReconcileProcessor -->> WorkspacesReconcileService: Generated resources
    WorkspacesReconcileService -->> GitLabAgent: Generated resources

Agent Logic

sequenceDiagram
    participant GitLabAgent as GitLab Agent
    participant K8sAPI as Kubernetes API
    GitLabAgent ->> K8sAPI: Create Kubernetes Namespace for the workspace
    GitLabAgent ->> K8sAPI: Create Kubernetes Secret for the <br> workspace's `secret_key` in the workspace's namespace
    GitLabAgent ->> K8sAPI: For Kubernetes Secret to be mounted as environment variables, <br> decrypt value each using workspace's `secret_key` and <br> create a Kubernetes Secret for them in the workspace's namespace
    GitLabAgent ->> K8sAPI: For Kubernetes Secret to be mounted as files, <br> decrypt value each using workspace's `secret_key` and <br> create a Kubernetes Secret for them in the workspace's namespace
    GitLabAgent ->> K8sAPI: Create other workspace resources <br> in the workspace namespace

Git configuration to be injected into the workspace

  • The git configuration items that need to be injected into the workspace are
    GIT_AUTHOR_NAME: <USER_NAME>
    GIT_AUTHOR_EMAIL: <USER_EMAIL>
    GIT_ASKPASS: /.workspace-git-config/git_askpass.sh
    GIT_ASKPASS_SCRIPT: |
      cat $GITLAB_PAT_FILE_PATH
    GITLAB_PAT_FILE_PATH: /.workspace-git-config/gitlab_pat
    GITLAB_PAT: <PAT_GENERATED>
  • GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL, GIT_ASKPASS and GITLAB_PAT_FILE_PATH will be mounted to the Deployment as environment variables while GIT_ASKPASS_SCRIPT and GITLAB_PAT will be mounted as a files at location /.workspace-git-config/git_askpass.sh and /.workspace-git-config/gitlab_pat respectively.
  • The reason for mounting GIT_ASKPASS_SCRIPT as a file is because GIT_ASKPASS points to a script which will provide the credentials.
  • The reason for mounting GITLAB_PAT as a file is because it allows us to update the GITLAB_PAT in Kubernetes Secret in the future(follow-up item when the need arises) and relying on Kubernetes to update the file inside the container of the workspace with eventual consistency without restarting the workspace.
  • These environment variables and file mounting will be done for the project-cloner container init container of the workspace.
  • In future, when we want to introduce authentication using SSH keys, we can use a similar method of mounting required Kubernetes Secrets as files/environment variables.

Notes and concerns

  • For future iterations, we will have to work with Auth team to make all user PATs generated by the workspace services to be non-editable by the end user. As part of this issue scope, the user PAT will get created and the user can view/revoke it in the profile for each workspace.
  • This solution is thinking from a long-term solution POV to be efficient in terms of building a generic solution which is extendable for other cases rather than building something very specific which we would have to dump to build a generic solution for our future needs. Thus, it might have a relatively larger scope. However, based on our understanding, the scope is still manageable.
  • The workspace_variables table is inspired by tables like ci_job_variables, ci_variables, etc. while workspace_variables_encryption_key is inspired by the pages_domain table. The database table design sets the foundation for our future use-cases of injecting variables from group/subgroups/project/user level variables/secrets.
  • The data in workspace_variables would keep on increasing as it does for CI tables. This might become a problem in the future.
    • One mitigation strategy would be to drop records from this tables when a workspace has been successfully terminated. However, this is out of scope for this issue.
  • Having a separate public/private key for each workspace isolates the damage that can be done in case there is a MITM attack between KAS and agent and the private key of the workspace used for encrypted is somehow leaked(since the private key would be sent in plain text format). The private key is only sent to agent during the workspace creation and during full sync to reduce the number of times we send this informaiton.
  • One concern raised is that the decryption of various values done at agent will be CPU consuming. However, it is better to have this decryption at the agent rather than at Rails. This is because the way agent communicates with Rails, Rails will have to generate all Kubernetes resources every time there is any update it needs to send to agent or whenever it needs to acknowledge some update that agent has sent Rails regarding the workspace. As part of generating the Kubernetes resources, it would have to decrypt various variables that are to be injected into the workspace. This problem will be further exasperated when we extend this feature to be generic enough to inject variables from group/subgroups/project/user level variables/secrets.
    • This can be mitigated by checking the SHAs of the encrypted values with the secrets already generated(which would have this SHA when they were created).
    • Another approach would be to move this inside the workspace which will decrypt these environment variables. However, this would not be as straight-forward.
    • This would be considered premature optimization and thus would not be the scope of this issue. It can be tackled in the future if the need arises.
  • We are assuming the user PAT will be valid for a time longer than the workspace has been created for. Right now, workspaces are auto-terminated after 120 hours.
  • Maybe use ECDSA with the P-256 curve public/private key pair for better performance as per analysis in https://gitlab.com/gitlab-org/gitlab/-/issues/361168#note_933171008 for better performance?

Other Solutions Considered

Option 1 - Generate a Personal Access Token scoped to a user for each workspace and send to agent without storing in database

Show more details

For HTTP

  • Generate a new Personal Access Token for every workspace that is created. The PAT will be user scoped instead of project scope as the user may want to be able to work with more than one project in the workspace in the future or just do other things which are scoped to the user but not the project(e.g. pulling a container image).
  • The PAT will be created using the PersonalAccessTokens::CreateService in RemoteDevelopment::Workspaces::Reconcile::DesiredConfigGenerator.generate_desired_config.
  • This PAT would be used in a Kubernetes Secret which is passed to the agent and is mounted on the Kubernetes Deployment representing the workspace. To mount the Secret on the Deployment, we will have to make changes to the devfile-gem.
  • We do not want to store the secret in the database and therefore the secret should not be stored in processed_devfile field on the workspace object.
  • The Kubernetes Secret which encrypted at rest in etcd would contain the following fields
    GIT_AUTHOR_NAME: <USER_NAME>
    GIT_AUTHOR_EMAIL: <USER_EMAIL>
    GIT_ASKPASS: /.workspace-git-config/git_askpass.sh
    GIT_ASKPASS_SCRIPT: |
      cat $GITLAB_PAT_FILE_PATH
    GITLAB_PAT_FILE_PATH: /.workspace-git-config/gitlab_pat
    GITLAB_PAT: <PAT_GENERATED>
  • GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL, GIT_ASKPASS and GITLAB_PAT_FILE_PATH will be mounted to the Deployment as environment variables while GIT_ASKPASS_SCRIPT and GITLAB_PAT will be mounted as a files at location /.workspace-git-config/git_askpass.sh and /.workspace-git-config/gitlab_pat respectively.
  • The reason for mounting GIT_ASKPASS_SCRIPT as a file is because GIT_ASKPASS points to a script which will provide the credentials.
  • The reason for mounting GITLAB_PAT as a file is because it allows us to update the GITLAB_PAT in Kubernetes Secret and relying on Kubernetes to update the file inside the container of the workspace with eventual consistency.
  • These environment variables and file mounting will be done for the project-cloner container and the main container of the workspace.
  • When the user performs any git actions, the git CLI will use the credentials provided by the file pointed by GIT_ASKPASS (reference)
  • Additionally, the PAT will be revoked when the workspace is terminated using the PersonalAccessTokens::RevokeService. The workspace will have to hold a FK to the generated PAT.
  • The Kubernetes Secret will only be created for a new workspace. For an existing workspace, we will assume that the Kubernetes Secret exists in the Kubernetes cluster. It is not possible for us to regenerate a PAT during every partial/full reconciliation. Rotating of the PAT while the workspace has not been terminated, would be tackled in a followup issue if needed. Thus the secret will not be part of the inventory configmap that we use to track resources related to the workspace in Kubernetes.
sequenceDiagram
    actor User
    participant GitLabUI as GitLab UI
    participant GraphQLAPI as GraphQL API
    participant CreateCreateProcessor as Create::CreateProcessor
    participant CreateDevfileProcessor as Create::DevfileProcessor
    User ->> GitLabUI: Create Workspace
    GitLabUI ->> GraphQLAPI: Create Workspace
    GraphQLAPI ->> CreateCreateProcessor: Process creation
    CreateCreateProcessor->> CreateDevfileProcessor: Process Devfile
    CreateDevfileProcessor ->> CreateDevfileProcessor: Generate cloner with credentials copy command
sequenceDiagram
    participant PersonalAccessTokensCreateService as PersonalAccessTokens::CreateService
    participant ReconcileDesiredConfigGenerator as Reconcile::DesiredConfigGenerator
    participant ReconcileReconcileProcessor as Reconcile::ReconcileProcessor
    participant WorkspacesReconcileService as Workspaces::ReconcileService
    participant GitLabAgent as GitLab Agent
    participant K8sAPI as Kubernetes API
    participant Kublet as Kublet
    participant ContainerRuntime as Container Runtime
    participant Cloner Init Container as Cloner Init Container
    participant Workspace Volume as Workspace Volume
    GitLabAgent ->> WorkspacesReconcileService: Reconcile
    WorkspacesReconcileService ->> ReconcileReconcileProcessor: Process
    ReconcileReconcileProcessor ->> ReconcileDesiredConfigGenerator: Generate desired configuration
    ReconcileDesiredConfigGenerator ->> ReconcileDesiredConfigGenerator: Check if PAT is already present for workspace using convention <workspace-name>-PAT
    ReconcileDesiredConfigGenerator ->> PersonalAccessTokensCreateService: Generate PAT
    PersonalAccessTokensCreateService -->> ReconcileDesiredConfigGenerator: New PAT for workspace
    ReconcileDesiredConfigGenerator ->> ReconcileDesiredConfigGenerator: Inject Kubernetes secret into generated workspace k8s resources
    ReconcileDesiredConfigGenerator -->> ReconcileReconcileProcessor: Generated resources
    ReconcileReconcileProcessor -->> WorkspacesReconcileService: Generated resources
    WorkspacesReconcileService -->> GitLabAgent: Generated resources
    GitLabAgent ->> K8sAPI: Apply resources (including secret)
    Kublet ->> K8sAPI: Watch for resources - Start Pod
    Kublet ->> ContainerRuntime: Start Cloner Init Container
    ContainerRuntime ->> Cloner Init Container: Start with script
    Cloner Init Container ->> Workspace Volume: Copy credentials to custom file used by `GIT_ASKPASS` in volume

For SSH

  • In future, when we want to introduce authentication using SSH keys, we can use a similar method of mounting required Kubernetes Secrets as files/environment variables.

Option 2 - Use the OAuth2.0 token for HTTP, for SSH inject a new private/public key pair into the repository

Show more details

In this option we have two routes depending on whether the user connects via HTTP/S or SSH.

For HTTP

The GitLab Workspace Proxy already gets an OAuth2 token when the user is authenticated. Currently the token is only used to verify that the user has access to the workspace by calling the workspaces GraphQL API. We currently don't send this token to the workspace. In this design every workspace will run a sidecar proxy. The proxy will read the OAuth access token that is injected into the HTTP request and will then use the injected token to clone the git repository. When the token expires, the workspaces proxy requests the user to re-authenticate and passes a new token in the request to the Workspace Auth Proxy.

sequenceDiagram
    actor User
    User ->> Kubernetes Ingress: Access Workspace
    Kubernetes Ingress ->> GitLab Workspaces Proxy: Access Workspace
    GitLab Workspaces Proxy ->> GitLab: OAuth2.0 Flow (not detailed)
    GitLab -->> GitLab Workspaces Proxy: Access Token
    GitLab Workspaces Proxy ->> GitLab: Authorize via GraphQL API
    GitLab Workspaces Proxy ->> Workspace HTTP Git Cloner: Send Access Token in proxy header
    Workspace HTTP Git Cloner ->> Workspace HTTP Git Cloner: Update Git Credentials file
    Workspace HTTP Git Cloner ->> Workspace HTTP Git Cloner: Clone repository

For SSH

For SSH we will need to generate a new SSH key pair for the user when the workspace is created. We then will inject the SSH Key pair into the proxy using a similar mechanism to Option 1 above. The public key will be registered as a Key object in GitLab.

Option 3 - Use GitLab as secrets store and sync secrets to Kubernetes based on Kubernetes Secrets Store CSI Driver

Show more details
  • Allow users to define secrets at a project. user and workspace level.
    • When defined at a project level, any user who can create a workspace from the project, will have those secrets injected into the workspace.
    • When defined at a user level, all workspaces created by a user would have those secrets injected into the workspace.
    • When defined at a workspace level, only the said workspace would have those secrets injected into the workspace.
    • If there is a namespace clash between project level secrets and user level secrets, user level secrets will take precedence.
    • If there is a namespace clash between user level secrets and workspace level secrets, workspace level secrets will take precedence.
    • Defining workspace level secrets allows us to enforce certain secrets that we'd like to inject into a workspace.
  • Define a new module in GA4K which syncs secrets from GitLab into Kubernetes as Secrets.
  • As part of Remote Development, when create a workspace, along with Deployment and Service Kubernetes resources, create a SecretProviderClass Kubernetes resource(or the required ConfigMap/Secret resource).
  • This would also allow for secret auto rotation in GitLab with sync in the Kubernetes secrets and Kubernetes Pods mounting the said Secret.
  • The secrets will be stored in the database encrypted, as they are right now for CI/CD secrets.
  • The user PAT would be created as a workspace level secret.

Next Steps

Edited by Vishal Tak