Alexandr Evstigneev and Nikolay Kuznetsov from JetBrains were helping us figure out why was JetBrains Gateway not working correctly when we try to use it for a workspace. In that process, we noticed that there is something off about gitlab-workspaces-proxy around how it is handling requests, stdout, stderr, etc.
could you please try to run commands on your local machine:
ssh @ uname -sm
ssh @ uname -sm 2> /dev/null
and check whether the output comes on stdout or stderr. where user and hosts are for the remote machine you are configuring
Gateway makes use of exec and signal requests. Make sure your proxy supports both, honors/proxies PTY request flag, properly forwards both stdout and stderr streams without truncating them, and provides process exit code.
Quick inspection of the code doesn't show anything obviously wrong with proxying bits, but I'm not familiar with these specific go libraries and whether the cancellation logic on connection close there guarantees delivery of all requests and stream data.
Based on snippets with manual ssh command execution missing some replies, truncation of streams might be what's going on here.
Other weird things noticed
When running everything on local(Mac), things work fine.
When running Kubernetes cluster on a linux machine on a remote server, this issue starts popping up.
Maybe this is because this causes some delay in the network and this triggers some race condition or concurrency problem?
@shekharpatnaik - Since you largely worked on adding the SSH support, wanted to check in with you if you faced similar issues while developing this? Given the uncertainty around output, I feel like there might be some sort of race condition or concurrency problem somewhere which is mixing stdout and stderr, truncating the streams?
I never came across this issue. We have quite a few channels in the code so its possible that its getting stuck in a race condition. Let me know if you want to pair on this.
This helped catch certain errors - The io.Copy that occurs between the source and target channel as part of proxying returns io.EOF errors. This occurs when the copying did not complete successfully. I haven't been able to figure out yet what is that the case. Even in cases where the connection has been established(and seemingly there are not visible errors), there are still io.EOF files. The closing of the channels also fail for similar reason.
Tried different permutation and combination with the existing code to figure out the race condition. One such attempt is at Fix ssh issue (gitlab-org/remote-development/gitlab-workspaces-proxy!59 - closed) . Essentially, I added wait groups with the go routines and avoided some defering of connection closing logic. But that did not solve the problem.
So I started from scratch to write a SSH proxy in Go.
The above block of code is a fully functional SSH proxy which follows the standard syntax. 60022 is the target of the SSH connection. 2209 is the proxy server. Assuming the target has a username gitlab-workspaces with an empty password, running the command ssh gitlab-workspaces@localhost -p 2209 establishes the connection with no errors. Trying different permutation and combination of running command directly instead of a shell, redirecting stdout/stderr all work.
Attempting to integrate this into our gitlab-workspaces-proxy is a problem because we do not follow the standard SSH syntax. Which means the incoming username as per SSH connection details is the workspace name and it tries to authenticate with the backend workspace using this username. However, this username does not exist in the workspace.
So with the "dynamic linux user" as a dead end, I was looking at what else can be done. I was thinking if we can treat our gitlab-workspaces-proxy as a jump server - https://www.cyberciti.biz/faq/linux-unix-ssh-proxycommand-passing-through-one-host-gateway-server/ . Something like ssh -J WORKSPACE_NAME@GITLAB_WORKSPACES_PROXY_SSH_IP_OR_DOMAIN:22 USERNAME_INSIDE_WORKSPACE@WORKSPACE_NAME.WORKSPACE_NAMESPACE -p 60022. The logic being - GITLAB_WORKSPACES_PROXY_SSH_IP_OR_DOMAIN is resolvable. Which will be used as a jump server to jump to (only internally accessible) Kubernetes service WORKSPACE_NAME.WORKSPACE_NAMESPACE on port 60022. This Kubernetes service is not resolvable directly by the user.
Our gitlab-workspaces-proxy doesn't seem to support this as it stands today. If we get this to work, we will follow the standard SSH syntax which will be a big win along with solving the truncation issue highlighted in this issue by using the code block above.
Explore how to support "jumping" through gitlab-workspaces-proxy.
If that doesn't work, explore running Open SSHD directly as part of gitlab-workspaces-proxy(instead of the SSH server written in Go) and figure out how to support support authorization for authorizing user's access to the workspace.
If none of this work, explore how to run the proxy server written in Go along which will accept the requests and then shell it out to ssh ... and attach the stdout and stderr to the incoming channel.
The developer UX is really bad when working with gitlab-workspaces-proxy. It is really difficult to have a fast feedback loop while making any changes in the code. To test it out, we need to build the container image and deploy it to test even small changes. Even after that, some port-forwards are required with the underlying Kubernetes cluster to be able to communicate with it.
But this is a narrow case which has always been broken since initial implementation, and technically JetBrains is not an officially supported IDE in the Workspaces feature... yet.
Architecture can handle any TCP/UDP connections not just SSH.
Routing is based on HTTP layer. Can pass any additional metadata that we might need.
No risk of implementing SSH wrongly since the traffic forwarding is based on HTTP. For wrapping/unwrapping TCP traffic, we can use https://github.com/vi/websocat .
Cons
User needs to run a WebSocket server whenever they start an SSH connection. This affects the UX. However, this can be easily mitigated by creating helper commands in glab.
Make this port locally accessible - kubectl port-forward pod/$POD_NAME 1234:1234
Start a websocket server to wrap TCP traffic to WebSocket - websocat -b tcp-l:127.0.0.1:1236 ws://127.0.0.1:1234 -v
Start SSH connection to this websocket - ssh -p 1236 gitlab-workspaces@127.0.0.1
I have tried this and it works very nicely. I have also tried connecting JetBrains locally and it works. We can of course improve the UX. WDYT @shekharpatnaik@cwoolley-gitlab ?
This seems great if it works. I think we need to prioritize addressing this for many reasons.
QUESTION: Are there other implementations/competitors who use this approach for tunneling SSH connections? If so, can we identify any potential problems we haven't thought of by looking at those implementations or their docs?
Regarding the need for users to use kubectl and run commands:
User needs to run a WebSocket server whenever they start an SSH connection. This affects the UX. However, this can be easily mitigated by creating helper commands in glab.
kubectl exec -it $POD_NAME -- /bin/bash
Does this mean that the user has to have kubectl installed locally in addition to glab in order to set this up?
If so, that seems like a big requirement for many users. Unless we bundle kubectl in glab or something, which also seems like a big lift.
Question:
Is this maybe another case for a generic executable which can (optionally) be injected into the workspaces in order to manage things like this? As we discussed separately in the AI agents issue.
In this case, the executable could contain the logic to automatically start the websocket server. We could also even use it to ensure that the sshd server is properly installed/configured.
The point of this approach is to greatly simplify the amount of manual setup needed by users on their workspaces to make these sorts of things work.
They would just install the executable in their workspaces (ideally could just be a server-side setting or configured in the devfile, etc.)
And in cases like this where there needs to be something corresponding running on their local environment (e.g. the client side of the websocket), we could lean on glab for whatever is needed there.
Are there other implementations/competitors who use this approach for tunneling SSH connections? If so, can we identify any potential problems we haven't thought of by looking at those implementations or their docs?
Good idea
Does this mean that the user has to have kubectl installed locally in addition to glab in order to set this up?
Nope. It is only required for demonstration because the code is running on my local machine. In reality that code will run inside a Kubernetes cluster where the Kubernetes Service would be directly accessible.
Is this maybe another case for a generic executable which can (optionally) be injected into the workspaces in order to manage things like this?
Definitely one way to achieve this generically.
They would just install the executable in their workspaces (ideally could just be a server-side setting or configured in the devfile, etc.)
Since it would be a executable binary, we can always inject it into the workspace. No additional setup would be required from their end.
And in cases like this where there needs to be something corresponding running on their local environment (e.g. the client side of the websocket), we could lean on glab for whatever is needed there.
To demonstrate this, we don't need to build this generic executable. We can inject websocat inside the workspace directly. But the generic executable is definitely a good approach long term to significantly improve UX and consolidate all "small items related to workspace startup".
@vtak Looks awesome! I know this is a high level description, but i'm curious on how you are going to:
Bootstrap the local web-socket server with the secret key and workspace name so that on calling ssh -p 1236 gitlab-workspaces@127.0.0.1 the connection is setup with the intended workspace
Handle ssh security concerns: From my POV we are not leveraging password or key based authentication anymore on ssh, rather relying on X-Secret-Key. If this is correct, do we have a safe way of handling the maintenance/ownership/distribution of this secret?
Bootstrap the local web-socket server with the secret key and workspace name so that on calling ssh -p 1236 gitlab-workspaces@127.0.0.1 the connection is setup with the intended workspace
This is something we need to look into more for a smoother user experience.Some options include
Make convenience commands in glab CLI.
Update SSH Config file to have all the necessary proxy commands (with the assumption that websocat is installed on the user's machine).
Use Unix sockerts while listening for traffic so that you don't have to worry about open ports.
Handle ssh security concerns: From my POV we are not leveraging password or key based authentication anymore on ssh, rather relying on X-Secret-Key. If this is correct, do we have a safe way of handling the maintenance/ownership/distribution of this secret?
This is existing behaviour. We rely on a user's Personal Access Token to perform the SSH connection. With the new approach, all we are suggesting is that we will pass this token in the header of the HTTP(websocket) traffic. Passing authorization headers in HTTP traffic is fairly standard practice. As far as maintenance of the token/secret is concerned, the user is responsible for this even in existing model. This of course can be improved. e.g. perform authentication/authorization through glab CLI, store those tokens/secrets somewhere, pass those secrets in the HTTP(websocket) traffic which will be used by gitlab-workspaces-proxy to authenticate and authorize. However, that would not be in the scope of this issue.
Great questions @Saahmed . As we work on it more, these will get answered more concretely. Right now, it is very "hand-waving" stuff because it is still a research issue.
Shekhar and I discussed about this and we decided to move forward with the above Websockets(on both ends) solutions. Reason - it is more generic, can cater to all kinds of traffic - tcp/udp/etc. Low maintenance is required on gitlab-worskapces-proxy. Unlocks some interesting use-cases.