Workspaces: Design for creating workspaces from private repositories
Background
GitLab Workspaces is our remote development product that allows developers to spin up ephemeral workspaces based on a devfile. Workspaces run in Kubernetes as pods running a version of our WebIDE. Workspaces are currently offered as a BYOK (bring your own kubernetes) offering, however the plan is to support GitLab.com managed workspaces sometime in the future.
Workspaces can be spun up from the GitLab UI once GitLab agent for Kubernetes is installed in a Kubernetes cluster and registered. Ingress to the workspace is controlled using the GitLab Workspaces Proxy which is responsible for authentication and authorization. Currently the workspaces proxy only supports connectivity to the IDE via HTTP however in the future we also want to support SSH.
Problem Definition
In the Beta release for GitLab workspaces, we only allowed users to use public repositories and did not inject any user credentials into a running workspace. The repository is cloned using a cloner container that runs as an init container. The cloner container just does a simple git clone. To create a workspace from a private repository, we need to inject some credentials which can access the said private repository.
Note - User should be able to perform git operations transparently from within the workspaces is out of scope of this issue and will be taken as a followup. The scope of this issue is to only be able to create a workspace from a private repository. Refer this comment.
Solution Proposed
Database Changes
-
Alter the
workspacestable- Add
personal_access_token_idcolumn - To store the foreign key reference to the user PAT generated for each workspace so that it can be revoked on workspace termination. - Add
encrypted_secret_keyandencrypted_secret_key_ivcolumns - To store a uniquesecret_keywhich will be used to encrypt all the variables associated with the workspace in theworkspace_variablestable. Thesecret_keyis stored encrypted in the database using[attr_encrypted](https://github.com/attr-encrypted/attr_encrypted)(algorithmaes-256-gcm). The plain textsecret_keyis passed to the agent so that it can decrypt the encrypted workspace variables.
- Add
-
Create
workspace_variablestable to store the variables(secret or otherwise) to be injected into the workspace.- Add
keycolumn - To store the key of the variable(secret or otherwise) plain text. - Add
encrypted_valueandencrypted_value_ivcolumns - To store thevalueof the variable(secret or otherwise) which is encrypted with the associated workspace'ssecret_key. - Add
workspace_idcolumns - Foreign key reference to theworkspacestable.
- Add
Rails Creation Logic
sequenceDiagram
actor User
participant GitLabUI as GitLab UI
participant GraphQLAPI as GraphQL API
participant CreateCreateProcessor as Create::CreateProcessor
participant CreateDevfileProcessor as Create::DevfileProcessor
participant CreateVariablesProcessor as Create::VariablesProcessor
participant PersonalAccessTokensCreateService as PersonalAccessTokens::CreateService
User ->> GitLabUI: Create Workspace
GitLabUI ->> GraphQLAPI: Create Workspace
GraphQLAPI ->> CreateCreateProcessor: Process creation
CreateCreateProcessor ->> CreateDevfileProcessor: Process Devfile
CreateDevfileProcessor -->> CreateCreateProcessor: Processed Devfile
CreateCreateProcessor ->> CreateVariablesProcessor: Add variables to be injected into the workspace to the DB
CreateVariablesProcessor ->> CreateVariablesProcessor: Generate `secret_key` <br> for the workspace and <br> add it to the DB. <br> Secret key would be <br> encrypted using `Gitlab::Application.secrets.db_key_base`
CreateVariablesProcessor ->> PersonalAccessTokensCreateService: Generate user PAT
PersonalAccessTokensCreateService -->> CreateVariablesProcessor: User PAT
CreateVariablesProcessor -->> CreateVariablesProcessor: Store `personal_access_token_id` <br> in `workspaces` table as foreign key
CreateVariablesProcessor->> CreateVariablesProcessor: Encrypt user PAT using workspace's `secret_key` <br> and store in DB with `key` as GITLAB_PAT
CreateVariablesProcessor->> CreateVariablesProcessor: Encrypt git configuration variables like <br> GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL, <br> GIT_ASKPASS, GITLAB_PAT_FILE_PATH, <br> and GIT_ASKPASS_SCRIPT using workspace's `secret_key` <br> and store in DB
CreateVariablesProcessor -->> CreateCreateProcessor:
CreateCreateProcessor -->> GraphQLAPI:
GraphQLAPI -->> GitLabUI:
GitLabUI -->> User:
Rails Termination Logic
sequenceDiagram
actor User
participant GitLabUI as GitLab UI
participant GraphQLAPI as GraphQL API
participant UpdateUpdateProcessor as Update::UpdateProcessor
participant PersonalAccessTokensRevokeService as PersonalAccessTokens::RevokeService
User ->> GitLabUI: Terminate Workspace
GitLabUI ->> GraphQLAPI: Update Workspace
GraphQLAPI ->> UpdateUpdateProcessor: Process updation
UpdateUpdateProcessor ->> PersonalAccessTokensRevokeService: Revoke user PAT
PersonalAccessTokensRevokeService -->> UpdateUpdateProcessor:
UpdateUpdateProcessor ->> UpdateUpdateProcessor: Update workspace desired state to Terminated
UpdateUpdateProcessor -->> GraphQLAPI:
GraphQLAPI -->> GitLabUI:
GitLabUI -->> User:
Agent <-> Rails Logic
sequenceDiagram
participant GitLabAgent as GitLab Agent
participant WorkspacesReconcileService as Workspaces::ReconcileService
participant ReconcileReconcileProcessor as Reconcile::ReconcileProcessor
participant ReconcileDesiredConfigGenerator as Reconcile::DesiredConfigGenerator
participant ReconcileDevfileParser as Reconcile::DevfileParser
GitLabAgent ->> WorkspacesReconcileService: Reconcile
WorkspacesReconcileService ->> ReconcileReconcileProcessor: Process
ReconcileReconcileProcessor ->> ReconcileDesiredConfigGenerator: Generate desired configuration
ReconcileDesiredConfigGenerator ->> ReconcileDevfileParser: Generate Kubernetes resources <br> from processed devfile
ReconcileDevfileParser -->> ReconcileDesiredConfigGenerator: Kubernetes resources <br> (Deployment and Service)
ReconcileDesiredConfigGenerator ->> ReconcileDesiredConfigGenerator: Generate Kubernetes Secret for <br> the workspace's `secret_key`
ReconcileDesiredConfigGenerator ->> ReconcileDesiredConfigGenerator: Generate Kubernetes Secret for <br> the encrypted variables to be mounted as environment variables
ReconcileDesiredConfigGenerator ->> ReconcileDesiredConfigGenerator: Generate Kubernetes Secret for <br> the encrypted variables to be mounted as files
ReconcileDesiredConfigGenerator ->> ReconcileDesiredConfigGenerator: Inject Kubernetes Secret into <br> generated Kubernetes Deployment
ReconcileDesiredConfigGenerator -->> ReconcileReconcileProcessor: Generated resources
ReconcileReconcileProcessor -->> WorkspacesReconcileService: Generated resources
WorkspacesReconcileService -->> GitLabAgent: Generated resources
Agent Logic
sequenceDiagram
participant GitLabAgent as GitLab Agent
participant K8sAPI as Kubernetes API
GitLabAgent ->> K8sAPI: Create Kubernetes Namespace for the workspace
GitLabAgent ->> K8sAPI: Create Kubernetes Secret for the <br> workspace's `secret_key` in the workspace's namespace
GitLabAgent ->> K8sAPI: For Kubernetes Secret to be mounted as environment variables, <br> decrypt value each using workspace's `secret_key` and <br> create a Kubernetes Secret for them in the workspace's namespace
GitLabAgent ->> K8sAPI: For Kubernetes Secret to be mounted as files, <br> decrypt value each using workspace's `secret_key` and <br> create a Kubernetes Secret for them in the workspace's namespace
GitLabAgent ->> K8sAPI: Create other workspace resources <br> in the workspace namespace
Git configuration to be injected into the workspace
- The git configuration items that need to be injected into the workspace are
GIT_AUTHOR_NAME: <USER_NAME> GIT_AUTHOR_EMAIL: <USER_EMAIL> GIT_ASKPASS: /.workspace-git-config/git_askpass.sh GIT_ASKPASS_SCRIPT: | cat $GITLAB_PAT_FILE_PATH GITLAB_PAT_FILE_PATH: /.workspace-git-config/gitlab_pat GITLAB_PAT: <PAT_GENERATED> -
GIT_AUTHOR_NAME,GIT_AUTHOR_EMAIL,GIT_ASKPASSandGITLAB_PAT_FILE_PATHwill be mounted to the Deployment as environment variables whileGIT_ASKPASS_SCRIPTandGITLAB_PATwill be mounted as a files at location/.workspace-git-config/git_askpass.shand/.workspace-git-config/gitlab_patrespectively. - The reason for mounting
GIT_ASKPASS_SCRIPTas a file is becauseGIT_ASKPASSpoints to a script which will provide the credentials. - The reason for mounting
GITLAB_PATas a file is because it allows us to update theGITLAB_PATin Kubernetes Secret in the future(follow-up item when the need arises) and relying on Kubernetes to update the file inside the container of the workspace with eventual consistency without restarting the workspace. - These environment variables and file mounting will be done for the
project-clonercontainer init container of the workspace. - In future, when we want to introduce authentication using SSH keys, we can use a similar method of mounting required Kubernetes Secrets as files/environment variables.
Notes and concerns
- For future iterations, we will have to work with Auth team to make all user PATs generated by the workspace services to be non-editable by the end user. As part of this issue scope, the user PAT will get created and the user can view/revoke it in the profile for each workspace.
- This solution is thinking from a long-term solution POV to be efficient in terms of building a generic solution which is extendable for other cases rather than building something very specific which we would have to dump to build a generic solution for our future needs. Thus, it might have a relatively larger scope. However, based on our understanding, the scope is still manageable.
- The
workspace_variablestable is inspired by tables likeci_job_variables,ci_variables, etc. whileworkspace_variables_encryption_keyis inspired by thepages_domaintable. The database table design sets the foundation for our future use-cases of injecting variables from group/subgroups/project/user level variables/secrets. - The data in
workspace_variableswould keep on increasing as it does for CI tables. This might become a problem in the future.- One mitigation strategy would be to drop records from this tables when a workspace has been successfully terminated. However, this is out of scope for this issue.
- Having a separate public/private key for each workspace isolates the damage that can be done in case there is a MITM attack between KAS and agent and the private key of the workspace used for encrypted is somehow leaked(since the private key would be sent in plain text format). The private key is only sent to agent during the workspace creation and during full sync to reduce the number of times we send this informaiton.
- One concern raised is that the decryption of various values done at agent will be CPU consuming. However, it is better to have this decryption at the agent rather than at Rails. This is because the way agent communicates with Rails, Rails will have to generate all Kubernetes resources every time there is any update it needs to send to agent or whenever it needs to acknowledge some update that agent has sent Rails regarding the workspace. As part of generating the Kubernetes resources, it would have to decrypt various variables that are to be injected into the workspace. This problem will be further exasperated when we extend this feature to be generic enough to inject variables from group/subgroups/project/user level variables/secrets.
- This can be mitigated by checking the SHAs of the encrypted values with the secrets already generated(which would have this SHA when they were created).
- Another approach would be to move this inside the workspace which will decrypt these environment variables. However, this would not be as straight-forward.
- This would be considered premature optimization and thus would not be the scope of this issue. It can be tackled in the future if the need arises.
- We are assuming the user PAT will be valid for a time longer than the workspace has been created for. Right now, workspaces are auto-terminated after 120 hours.
Maybe useECDSA with the P-256 curvepublic/private key pair for better performance as per analysis in https://gitlab.com/gitlab-org/gitlab/-/issues/361168#note_933171008 for better performance?
Other Solutions Considered
Option 1 - Generate a Personal Access Token scoped to a user for each workspace and send to agent without storing in database
Show more details
For HTTP
- Generate a new Personal Access Token for every workspace that is created. The PAT will be user scoped instead of project scope as the user may want to be able to work with more than one project in the workspace in the future or just do other things which are scoped to the user but not the project(e.g. pulling a container image).
- The PAT will be created using the
PersonalAccessTokens::CreateServiceinRemoteDevelopment::Workspaces::Reconcile::DesiredConfigGenerator.generate_desired_config. - This PAT would be used in a Kubernetes Secret which is passed to the agent and is mounted on the Kubernetes Deployment representing the workspace. To mount the Secret on the Deployment, we will have to make changes to the devfile-gem.
- We do not want to store the secret in the database and therefore the secret should not be stored in
processed_devfilefield on theworkspaceobject. - The Kubernetes Secret which encrypted at rest in
etcdwould contain the following fieldsGIT_AUTHOR_NAME: <USER_NAME> GIT_AUTHOR_EMAIL: <USER_EMAIL> GIT_ASKPASS: /.workspace-git-config/git_askpass.sh GIT_ASKPASS_SCRIPT: | cat $GITLAB_PAT_FILE_PATH GITLAB_PAT_FILE_PATH: /.workspace-git-config/gitlab_pat GITLAB_PAT: <PAT_GENERATED> -
GIT_AUTHOR_NAME,GIT_AUTHOR_EMAIL,GIT_ASKPASSandGITLAB_PAT_FILE_PATHwill be mounted to the Deployment as environment variables whileGIT_ASKPASS_SCRIPTandGITLAB_PATwill be mounted as a files at location/.workspace-git-config/git_askpass.shand/.workspace-git-config/gitlab_patrespectively. - The reason for mounting
GIT_ASKPASS_SCRIPTas a file is becauseGIT_ASKPASSpoints to a script which will provide the credentials. - The reason for mounting
GITLAB_PATas a file is because it allows us to update theGITLAB_PATin Kubernetes Secret and relying on Kubernetes to update the file inside the container of the workspace with eventual consistency. - These environment variables and file mounting will be done for the project-cloner container and the main container of the workspace.
- When the user performs any git actions, the git CLI will use the credentials provided by the file pointed by
GIT_ASKPASS(reference) - Additionally, the PAT will be revoked when the workspace is terminated using the
PersonalAccessTokens::RevokeService. The workspace will have to hold a FK to the generated PAT. - The Kubernetes Secret will only be created for a new workspace. For an existing workspace, we will assume that the Kubernetes Secret exists in the Kubernetes cluster. It is not possible for us to regenerate a PAT during every partial/full reconciliation. Rotating of the PAT while the workspace has not been terminated, would be tackled in a followup issue if needed. Thus the secret will not be part of the
inventory configmapthat we use to track resources related to the workspace in Kubernetes.
sequenceDiagram
actor User
participant GitLabUI as GitLab UI
participant GraphQLAPI as GraphQL API
participant CreateCreateProcessor as Create::CreateProcessor
participant CreateDevfileProcessor as Create::DevfileProcessor
User ->> GitLabUI: Create Workspace
GitLabUI ->> GraphQLAPI: Create Workspace
GraphQLAPI ->> CreateCreateProcessor: Process creation
CreateCreateProcessor->> CreateDevfileProcessor: Process Devfile
CreateDevfileProcessor ->> CreateDevfileProcessor: Generate cloner with credentials copy command
sequenceDiagram
participant PersonalAccessTokensCreateService as PersonalAccessTokens::CreateService
participant ReconcileDesiredConfigGenerator as Reconcile::DesiredConfigGenerator
participant ReconcileReconcileProcessor as Reconcile::ReconcileProcessor
participant WorkspacesReconcileService as Workspaces::ReconcileService
participant GitLabAgent as GitLab Agent
participant K8sAPI as Kubernetes API
participant Kublet as Kublet
participant ContainerRuntime as Container Runtime
participant Cloner Init Container as Cloner Init Container
participant Workspace Volume as Workspace Volume
GitLabAgent ->> WorkspacesReconcileService: Reconcile
WorkspacesReconcileService ->> ReconcileReconcileProcessor: Process
ReconcileReconcileProcessor ->> ReconcileDesiredConfigGenerator: Generate desired configuration
ReconcileDesiredConfigGenerator ->> ReconcileDesiredConfigGenerator: Check if PAT is already present for workspace using convention <workspace-name>-PAT
ReconcileDesiredConfigGenerator ->> PersonalAccessTokensCreateService: Generate PAT
PersonalAccessTokensCreateService -->> ReconcileDesiredConfigGenerator: New PAT for workspace
ReconcileDesiredConfigGenerator ->> ReconcileDesiredConfigGenerator: Inject Kubernetes secret into generated workspace k8s resources
ReconcileDesiredConfigGenerator -->> ReconcileReconcileProcessor: Generated resources
ReconcileReconcileProcessor -->> WorkspacesReconcileService: Generated resources
WorkspacesReconcileService -->> GitLabAgent: Generated resources
GitLabAgent ->> K8sAPI: Apply resources (including secret)
Kublet ->> K8sAPI: Watch for resources - Start Pod
Kublet ->> ContainerRuntime: Start Cloner Init Container
ContainerRuntime ->> Cloner Init Container: Start with script
Cloner Init Container ->> Workspace Volume: Copy credentials to custom file used by `GIT_ASKPASS` in volume
For SSH
- In future, when we want to introduce authentication using SSH keys, we can use a similar method of mounting required Kubernetes Secrets as files/environment variables.
Option 2 - Use the OAuth2.0 token for HTTP, for SSH inject a new private/public key pair into the repository
Show more details
In this option we have two routes depending on whether the user connects via HTTP/S or SSH.
For HTTP
The GitLab Workspace Proxy already gets an OAuth2 token when the user is authenticated. Currently the token is only used to verify that the user has access to the workspace by calling the workspaces GraphQL API. We currently don't send this token to the workspace. In this design every workspace will run a sidecar proxy. The proxy will read the OAuth access token that is injected into the HTTP request and will then use the injected token to clone the git repository. When the token expires, the workspaces proxy requests the user to re-authenticate and passes a new token in the request to the Workspace Auth Proxy.
sequenceDiagram
actor User
User ->> Kubernetes Ingress: Access Workspace
Kubernetes Ingress ->> GitLab Workspaces Proxy: Access Workspace
GitLab Workspaces Proxy ->> GitLab: OAuth2.0 Flow (not detailed)
GitLab -->> GitLab Workspaces Proxy: Access Token
GitLab Workspaces Proxy ->> GitLab: Authorize via GraphQL API
GitLab Workspaces Proxy ->> Workspace HTTP Git Cloner: Send Access Token in proxy header
Workspace HTTP Git Cloner ->> Workspace HTTP Git Cloner: Update Git Credentials file
Workspace HTTP Git Cloner ->> Workspace HTTP Git Cloner: Clone repository
For SSH
For SSH we will need to generate a new SSH key pair for the user when the workspace is created. We then will inject the SSH Key pair into the proxy using a similar mechanism to Option 1 above. The public key will be registered as a Key object in GitLab.
Option 3 - Use GitLab as secrets store and sync secrets to Kubernetes based on Kubernetes Secrets Store CSI Driver
Show more details
- Allow users to define secrets at a project. user and workspace level.
- When defined at a project level, any user who can create a workspace from the project, will have those secrets injected into the workspace.
- When defined at a user level, all workspaces created by a user would have those secrets injected into the workspace.
- When defined at a workspace level, only the said workspace would have those secrets injected into the workspace.
- If there is a namespace clash between project level secrets and user level secrets, user level secrets will take precedence.
- If there is a namespace clash between user level secrets and workspace level secrets, workspace level secrets will take precedence.
- Defining workspace level secrets allows us to enforce certain secrets that we'd like to inject into a workspace.
- Define a new module in GA4K which syncs secrets from GitLab into Kubernetes as Secrets.
- Reference - https://secrets-store-csi-driver.sigs.k8s.io/concepts.html and https://github.com/aws/secrets-store-csi-driver-provider-aws
- This module will look for
secretproviderclasses.secrets-store.csi.x-k8s.ioresources and based on their definition, it will sync the data from GitLab as Kubernetes Secrets. If we want to refrain from using a CRD, we can use watch over a ConfigMap/Secret with particular label which defines what secrets from GitLab are needed for a given workspace.
- As part of Remote Development, when create a workspace, along with Deployment and Service Kubernetes resources, create a
SecretProviderClassKubernetes resource(or the required ConfigMap/Secret resource). - This would also allow for secret auto rotation in GitLab with sync in the Kubernetes secrets and Kubernetes Pods mounting the said Secret.
- The secrets will be stored in the database encrypted, as they are right now for CI/CD secrets.
- The user PAT would be created as a workspace level secret.
Next Steps
-
Review solution by AppSec -
Spike out the proposed solution in Spike: GA4K: Decrypt data sent by rails to agen... (#414420 - closed) . We already have one spike done for figuring out how/what git configuration items to inject (Spike: GA4K: Decrypt data sent by rails to agen... (#414420 - closed)). -
Update Architecture Blueprint based on the results of the spike. -
Implement solution based on all feedback received in a separate issue.