Windows Runner - how does TLS validation work?
Summary
A customer reported issues with the Windows Runner performing TLS handshakes with endpoints running publicly issued certificates (ie: not private CAs, and not self-signed) They specifically use public CAs to avoid problems with TLS handshaking.
I've tried to work through this with the customer, but I've found that more changes were needed than should be necessary to fix the originally reported issue (TLS handshaking for downloading artifacts failed)
GitLab team members can find out more in the ticket and SF
Steps to reproduce
'Relevant logs and/or screenshots' as a minimal reproduction of the issue, which is that the TLS behavior seems to change in a fundamental way by providing tls-ca-file
- Windows runner, Windows server 2016. Shell executor.
- GitLab Omnibus server, Let's Encrypt certificate.
Step 1: Runner works: it's registered, it's polling for jobs. TLS handshaking with GitLab (Let's Encrypt, chained to IdenTrust/DST Root CA X3
for example) is working fine.
Step 2: Introduce artifacts. AWS S3 for artifacts, proxy_download
set to default (false). Certificate issuer for the S3 is signed by a widely distributed root. For example: Baltimore CyberTrust Root
ERROR: Download artifacts from co-ordinator... error couldn't execute GET against
https://gitlab.example.com/api/v4/jobs/123456/artifacts?direct_download=true: Get
https://artifacts-gitlab-example-com.s3.reg.amazon.ws.com/aa/bb/aabb.... x509:
certificate signed by unknown authority.
- TLS handshake is failing for S3
Step 3: Copy the root certificate for AWS's root CA to the server, and specify tls-ca-file
in config.toml
. Run verify:
C:\gitlab>gitlab-runner.exe verify
ERROR: Verifying runner... failed runner=xx status=couldn't execute POST against
https://git.example.com/api/v4/runners/verify: Post
https://git.example.com/api/v4/runners/verify: x509: certificate signed by unknown authority
Remove tls-ca-file
and this passes.
Step 4: Add both the Let's Encrypt and AWS roots to the same file (ie: a bundle) and specify in tls-ca-file
.. it works.
Actual behavior
Why does trusting the CA for AWS break the GitLab handshake?
At Step 1, how is the runner trusting the root for the GitLab? No change to config.toml
has been made, and of the three options for private CAs (bulleted below) tls-ca-file
is the only valid option
- Default - Read the system certificate
- this isn't used by Windows runners at all, right, because of the golang limitation?
reading from the system certificate store is not supported in Windows
- Specify a custom certificate file (
tls-ca-file
inconfig.toml
) - Read a PEM certificate from a predefined file
If running GitLab Runner as a Windows service, this will not work
Only the middle option works for Windows. So 'out of the box' with no config.toml
change, how does this work?
The behavior in steps 3 and 4 hints that perhaps the Runner isn't doing full certificate validation by default.
My hypothesis, given not very many levers to pull, is that providing tls-ca-file
changes how the runner behaves.
It then attempts full TLS checks, and will use those supplied roots, so if the CA for GitLab is not supplied, the runner cannot handshake at initialization.
Expected behavior
I can't find documentation which explains how TLS validation (and supplying roots) on Windows servers works.
There's the hints on the private CAs page but customers wouldn't expect to read instructions for private CAs to get public CAs working
I can make the documentation revision, but need some information.
The docs should cover off all these points (corrected as needed)
-
Windows runners do not use the Windows OS trust store as this is not supported by golang
-
'Out of the box' Windows runners trust the GitLab certificate ... how?
-
The role of
$CI_SERVER_TLS_CA_FILE
which appears to contain the GitLab leaf certificate, any intermediates, and has the root added on as well.- Is this supplied to the
git
anddocker
clients to handle TLS handshaking with GitLab?
- Is this supplied to the
-
The git client is configurable, customer comment on another issue noted that issues could be fixed one of two ways:
- Reinstalling git and checking the "User the native Windows Secure Channel library"
- Or
git config --system -e
then setsslBackend = schannel
-
What code paths activate/deactivate when
tls-ca-file
is supplied inconfig.toml
, or more exactly, how does the runner's behavior change, and what do customers need to do if they start supplying that parameter.
Relevant logs and/or screenshots
C:\gitlab-runner>type config.toml
concurrent = 1
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "yy"
url = "https://gitlab.com/"
token = "xx"
executor = "shell"
shell = "powershell"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
C:\gitlab-runner>.\gitlab-runner-windows-amd64.exe verify
Runtime platform arch=amd64 os=windows pid=9044 revision=8fa89735 version=13.6.0
Verifying runner... is alive runner=xx
- downloaded https://letsencrypt.org/certs/isrg-root-x2.pem from: https://letsencrypt.org/certificates/
- amended config.toml
C:\gitlab-runner>type config.toml
concurrent = 1
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "yy"
url = "https://gitlab.com/"
token = "xx"
executor = "shell"
shell = "powershell"
tls-ca-file = "c:/gitlab-runner/isrg-root-x2.pem"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
C:\gitlab-runner>.\gitlab-runner-windows-amd64.exe verify
Runtime platform arch=amd64 os=windows pid=2288 revision=8fa89735 version=13.6.0
WARNING: Failed to load system CertPool: crypto/x509: system root pool is not available on Windows
ERROR: Verifying runner... failed runner=xx status=couldn't execute POST against
https://gitlab.com/api/v4/runners/verify: Post https://gitlab.com/api/v4/runners/verify: x509:
certificate signed by unknown authority
- reconfigure to specify the current gitlab.com root
C:\gitlab-runner>type config.toml
concurrent = 1
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "yy"
url = "https://gitlab.com/"
token = "xx"
executor = "shell"
shell = "powershell"
tls-ca-file = "c:/gitlab-runner/Comodo_AAA_Services_root.pem"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
C:\gitlab-runner>.\gitlab-runner-windows-amd64.exe verify
Runtime platform arch=amd64 os=windows pid=10896 revision=8fa89735 version=13.6.0
WARNING: Failed to load system CertPool: crypto/x509: system root pool is not available on Windows
Verifying runner... is alive runner=xx
Environment description
See above.
Used GitLab Runner version
Customer ran gitlab-runner.exe verify
with: 13.10.0~beta.44.g5905c876
Reproduced with 13.6.0
Possible fixes
The best workaround for now is to add the S3/cloud storage certificates along with any other certificates being provided to Runner.
When we upgrade to Go 1.18, this should however be resolved (https://github.com/golang/go/issues/46287).