System-default SSL_CERT_DIR is being used during remote Gitaly over TLS
Based on a customer issue (internal), it appears that when gitlab-shell
is failing to pull certs from GitLab's trusted-certs
directory, instead using system default, causing x509 errors.
A git pull
when using a custom cert:
remote:
remote: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority"
remote:
The CA cert is correctly in /etc/gitlab/trusted-certs
, and gitlab:gitaly:check
returns successful on both GitLab host and Gitaly host.
Checking GitLab Shell ...
GitLab Shell: ... GitLab Shell version >= 12.2.0 ? ... OK (12.2.0)
Running /opt/gitlab/embedded/service/gitlab-shell/bin/check
Internal API available: OK
Redis available via internal API: OK
gitlab-shell self-check successful
Checking GitLab Shell ... Finished
Gitaly check returns ok:
gitlab-rake gitlab:gitaly:check
Checking Gitaly ...
Gitaly: ... default ... OK
Checking Gitaly ... Finished
It was observed that by moving the gitlab-shell
binary and creating a wrapper script that exports SSL_CERT_DIR, the clones are then successful:
mv /opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell{,-orig}
cat <<EOF > /opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell
#!/bin/bash
EXPORT SSL_CERT_DIR=/opt/gitlab/embedded/ssl/certs
/opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell-orig $@
EOF
chmod +x /opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell
Prior to the wrapper script, tracing gitlab-shell
shows it opening the standard certs directories:
strace.log
31844 openat(AT_FDCWD, "/etc/ssl/certs/ca-certificates.crt", O_RDONLY|O_CLOEXEC <unfinished ...> 31844 <... openat resumed>) = -1 ENOENT (No such file or directory) 31844 openat(AT_FDCWD, "/etc/pki/tls/certs/ca-bundle.crt", O_RDONLY|O_CLOEXEC <unfinished ...> 31844 <... openat resumed>) = 3 31849 openat(AT_FDCWD, "/etc/ssl/certs", O_RDONLY|O_CLOEXEC) = 3 31849 openat(AT_FDCWD, "/etc/ssl/certs/Makefile", O_RDONLY|O_CLOEXEC) = 3 31849 openat(AT_FDCWD, "/etc/ssl/certs/ca-bundle.crt", O_RDONLY|O_CLOEXEC) = 3 31841 openat(AT_FDCWD, "/etc/ssl/certs/ca-bundle.trust.crt", O_RDONLY|O_CLOEXEC) = 3 31841 openat(AT_FDCWD, "/etc/ssl/certs/make-dummy-cert", O_RDONLY|O_CLOEXEC) = 3 31841 openat(AT_FDCWD, "/etc/ssl/certs/renew-dummy-cert", O_RDONLY|O_CLOEXEC) = 3 31841 openat(AT_FDCWD, "/etc//localtime", O_RDONLY) = 3
When wrapping shell and exporting the cert dir:
strace.log
362 openat(AT_FDCWD, "/etc/ssl/certs/ca-certificates.crt", O_RDONLY|O_CLOEXEC <unfinished ...> 362 openat(AT_FDCWD, "/opt/gitlab/embedded/ssl/certs", O_RDONLY|O_CLOEXEC) = 3 362 openat(AT_FDCWD, "/opt/gitlab/embedded/ssl/certs/09789157.0", O_RDONLY|O_CLOEXEC <unfinished ...> 362 openat(AT_FDCWD, "/opt/gitlab/embedded/ssl/certs/1568f5bb.0", O_RDONLY|O_CLOEXEC) = 3 362 openat(AT_FDCWD, "/opt/gitlab/embedded/ssl/certs/1895e586.0", O_RDONLY|O_CLOEXEC) = 3
I have observed this occurring on gitlab 12.10.3-ee; customer reports 12.9.5-ee.
According to the crypto/x509 docs, the environment variables "SSL_CERT_FILE and SSL_CERT_DIR can be used to override the system default locations". I suspect that's at play somehow.
Further, setting any of ca_file
, ca_path
, or self_signed_cert
under the http_settings
in /var/opt/gitlab/gitlab-shell/config.yml
has no effect.
The workaround for now is leaving the bash wrapper script in place which allows for successful clones from remote Gitaly instance.