Trouble using non-CygWin SSH server and git executable
Summary: GitLab CI VirtualBox executor expects CygWin SSH server and Git executable in Windows guest OS, and this is not documented.
Steps to reproduce: I had initially installed the Windows OpenSSH server as documented at https://winscp.net/eng/docs/guide_windows_openssh_server and a normal Windows Git executable. GitLab multi runner in Linux host could not connect to the guest OS (error message: "Preparation failed: EOF") until I thought to try the CygWin SSH server instead. Then the runner could not clone my repo in the Windows host until I switched to the CygWin version of Git.
Details:
I'm using an Ubuntu host and a heterogeneous collection of VMs with the VirtualBox executor - some are Linux VMs but I also have several flavors of Windows VMs.
With all of the Windows ones I had significant trouble getting the executor to work. I have it working now, so this isn't an urgent issue, but I wanted to log my comments in case anyone else is encountering similar problems and in case the devs have solutions.
The short summary is: GitLab Multi Runner + VirtualBox VM + Windows guest appears to require the use of both the CygWin SSH server and the CygWin version of git in the guest OS. If intentional, this should be clearly documented as it cost me most of a week to figure out.
SSH:
The first problem was getting the multi-runner to successfully connect to the Windows guest OS. I used the instructions at https://winscp.net/eng/docs/guide_windows_openssh_server to install an OpenSSH server in the Windows guest, and I was able to connect to it from outside just fine, and get a cmd shell which I thought was what the runner would expect since cmd is documented as the default shell selection for Windows runners.
But the multi-runner connection would always fail with "Preparation failed: EOF". After trying various things and digging though the runner source I guessed that maybe the SSH server was not responding to the runner's login attempt as expected, so I tried the CygWin SSH server instead and that worked fine. I now suspect that the actual reason for the failure is that the VirtualBox executor actually expects to see bash at the other end of the SSH connection. When I tried to force the expectation of cmd by using the shell setting in config.toml, the executor would barf immediately saying it doesn't support shells that require batch files.
Why did I use that other OpenSSH server first? Because the VirtualBox runner documentation is ambiguous and I interpreted it the wrong way. It currently says:
- If on Windows, install Cygwin
- Install the OpenSSH server
Since the second bullet isn't indented, I took "the" to mean "THE" as in the most common or most supported OpenSSH server for the platform, and Google led me to the one I used. If that bullet was intended or said "the CygWin OpenSSH server" I would have got it right the first time.
Git:
The next problem was that the runner was unable to clone my git repo inside the guest OS. It would generate the .tmp directory containing the authentication files, then just die without an error message and without even creating the directory for the real repo. Again, I had no trouble cloning the repo manually even when logged in via SSH.
I tried playing with disabling SSL authentication in git, but in the end I decided to try the CygWin version of git just to see. Instant success a second time!
I don't recall whether I was previously using a manually installed Windows Git or the one Visual Studio will optionally install for you.
Possible fixes:
I'm not blocked now but I would suggest updating the documentation about the VirtualBox executor to make clear that it expects the CygWin versions of Git and OpenSSH in Windows guests, and that it expects to see bash at the other end of the SSH connection.